Differences between InfiniBand and Ethernet Networks

NADDOD Abel InfiniBand Expert Feb 2, 2023

What Is InfiniBand and Ethernet?

InfiniBand and Ethernet, as two important interconnection technologies, have their own characteristics and differences, and can not be generalized to say which is better or worse. They continue to develop and evolve in their different fields of application, and become indispensable in our network world.

InfiniBand is an open standard high-bandwidth, low-latency, and highly reliable network interconnection technology. The technology is defined by IBTA (InfiniBand Trade Alliance) to drive a wide range of applications in supercomputer clusters and, with the rise of artificial intelligence, is also the network interconnection technology of choice for GPU servers. Currently, the latest InfiniBand product is 200Gb/s HDR manufactured by NVIDIA Mellanox, which can provide end-to-end bandwidth of up to 200Gbps to the network, bringing an unparalleled network experience for high-performance computing, artificial intelligence and other areas, unleashing the maximum computing potential within the cluster.

Ethernet refers to the baseband LAN specification standard created by Xerox and jointly developed by Xerox, Intel and DEC. The common Ethernet standard was introduced on September 30, 1980, and is the most common communication protocol standard used in today’s existing LANs Currently, the IEEE 802.3 standards organization organized by the IEEE The development of its technical standards, which have been released for 100GE, 200GE, and 400GE Ethernet interfaces, is the highest rate transmission technology available.

What Are the Differences Between InfiniBand and Ethernet?

It goes without saying that Ethernet and IP technology form the cornerstone of the entire interconnection building in the world, and all people and intelligent devices rely on Ethernet to realize the interconnection of everything, which is related to its original design intention of achieving better compatibility and allowing different systems to interconnect better. It has become a de facto standard in the Internet after decades of development.

InfiniBand, on the other hand, was originally developed as a standard to solve the pain point of breaking the data transmission bottleneck in clusters in the high-performance computing scenario, and was positioned for high-end applications from the beginning. InfiniBand is inherently different from Ethernet, mainly in terms of bandwidth, latency, network reliability, and networking methods.

InfiniBand vs Ethernet: Bandwidth

Once born, the development of InfiniBand speed has been faster than that of Ethernet for a long time, mainly because InfiniBand is used for interconnection between servers in high-performance computing, while Ethernet is oriented more to the interconnection of devices at the terminal, and there is no too high demand for bandwidth. Therefore, this leads to the standard organization of Ethernet only to consider how to achieve interoperability when designing the standard. However, Infiniband standard considers not only interoperability but also how to reduce the load on CPU when the network is transmitted at high speed, so that the bandwidth can be used efficiently when communicating at high speed and as few CPU resources as possible are occupied.

For high-speed network traffic of more than 10G, if all messages are processed by the CPU to seal and unseal packets will take up very many resources, which is equivalent to using a large number of resources of expensive CPU resources just to handle the simple task of network transmission, which is a waste from the perspective of resource allocation. Since the rate of the first generation of InfiniBand SDR is 10Gbps, this phenomenon had to be taken into account, and thus, drawing on the idea of DMA technology to bypass the CPU, which not only improves the data transmission bandwidth but also reduces the burden on the CPU. RDMA technology was defined, thus solving the problem and enabling offloading of the CPU in high-speed network transmission while improving the utilization of the network. This allows InfiniBand networks to rapidly iterate from SDR 10Gbps, DDR 20Gbps, QDR 40Gps, FDR 56Gbps, EDR 100Gbps to today’s HDR 200Gbps. Benefiting from RDMA technology, where the CPU does not sacrifice more resources for network processing and slows down the overall HPC due to the significant increase in rate. Faster InfiniBand speeds are in development, like 400Gbps NDR and 800Gbps XDR, and will be launched soon.

InfiniBand vs Ethernet: Latency

We can talk about the latency comparison of InfiniBand and Ethernet in two aspects. One is on the network switch. As a network transport model in the layer 2 technology, Ethernet switches generally use MAC look-up table addressing and store-and-forward (some products borrowed from the InfiniBand Cut-though technology) due to the need to consider such as IP, MPLS, QinQ and other complex services The processing of complex services such as IP, MPLS, QinQ, etc., leads to a long processing process in Ethernet switches, which is usually several us (more than 200ns for those that support cut-through), while InfiniBand switches are very simple to process at layer 2, and only need to look up the forwarding path information based on the 16bit LID, while using Cut-Through.

latency
The second is on network adapter card (NIC). As aforementioned, with RDMA technology, the NIC does not need to go through the CPU to forward messages, which greatly accelerates the delay of message encapsulation and decapsulation processing. Generally, the send/receive latency (write, send) of InfiniBand NICs is 600ns, while the send/receive latency of Ethernet-based TCP UDP applications is around 10us, which is much larder than that of InifniBand.

InfiniBand vs Ethernet: Reliability

In the field of high-performance computing, packet loss and retransmission have a significant impact on the overall performance, so a highly reliable network protocol is needed to ensure the lossless nature of the network from the mechanism level, thus realizing its high reliability characteristics. This not only achieves a truly lossless network, but also makes it possible for the service stream to be transmitted on InfiniBand networks without cache accumulation because there is no congestion on the network, so that delay jitter is controlled to a minimum, thus creating an ideal and pure network.

However, the network constructed by Ethernet does not have a flow control mechanism based on scheduling. As a result, it is impossible to guarantee whether the peer end will be congested when the message is sent out. Therefore, in order to absorb the sudden increase of instantaneous traffic in the network, it is necessary to open a The cache space of dozens of MB is used to temporarily store these messages, and the implementation of the cache occupies a lot of chip area, which makes the chip area of the Ethernet switch with the same specification significantly larger than the InfiniBand chip, which not only costs more but also consumes more power. In addition, due to the lack of an end-to-end flow control mechanism, the network may lose packets due to buffer congestion in slightly extreme cases, causing large fluctuations in data forwarding performance.

reliability

InfiniBand vs Ethernet: Networking

The Ethernet networking method requires IP with ARP protocol to automatically generate MAC table entries, which requires each server in the network to send messages at regular intervals to ensure real-time updates of table entries. The network bandwidth is seriously wasted, so it is necessary to introduce VLAN mechanism to divide the virtual network and limit the size of the network, while the inadequacy of the table entry learning mechanism of Ethernet network itself will lead to loop network, and it is necessary to introduce protocols such as STP to ensure that the network forwarding path will not be looped, which increases the complexity of the network configuration. At the same time, with the rise of SDN technology, Ethernet networks are not genetically equipped with SDN traits because they were originally set up with the concept of compatibility, resulting in the need to change the message format (VXLAN) or the forwarding mechanism of the switch (openflow)in such a way that SDN deployments on ethernet networks need to be reworked to meet the requirements of SDN. This is inherent to the SDN concept for InfiniBand, where each InfiniBand layer 2 network has a subnet manager to configure the IDs of the nodes in the network (LocalID), and then the forwarding path information is calculated uniformly through the control plane and sent down to the InfiniBand exchange, so that the formation of an InfiniBand Layer 2 network requires any configuration to complete the network configuration, while no flooding problems, but also eliminates the operation of VLAN and ring breaking, you can easily deploy a large Layer 2 network with a scale of tens of thousands of servers. This is something that Ethernet cannot have.

Although the above perspectives are illustrated by the advantages of InfiniBand, it is undeniable that Ethernet itself has a broad rate support from 10Mbps to 400Gbps and more low-cost equipment support, making it widely used in most scenarios. Since its compatibility extends RDMA technology (RoCE), it makes not only high bandwidth, but also can initially have the qualities of lossless network, therefore, Ethernet network also appears in the field of high-performance computing in recent years, which also shows superb adaptability of Ethernet network from one side.

reliability

InfiniBand or RoCE: Which Network Should You Choose?

Choosing the right network for your data center can be a inportant question. For single-machine multi-card setups, communication is usually handled through PCIE or NVLINK within the machine, while multiple-machine multi-card setups require choosing between InfiniBand (IB) or RoCE networks. To support large-scale computing power, super-large-scale clusters require high-speed communication within the machine and low-latency, high-bandwidth communication between machines.

According to user statistics from the Top500 supercomputing centers, IB networks play a significant role in the top 10 and top 100 centers. OpenAI, for example, used an IB network built within Microsoft Azure to train ChatGPT, which has led to a surge in demand for large-scale supercomputing centers.

InfiniBand in Top500 supercomputing centers

On the other hand, Microsoft Azure is also a pioneer of the ECQCN algorithm in RoCE and has years of practical engineering experience in building large RoCE networks using Sonic white-box.

Why InfiniBand is the Superior Choice Compared to RoCE

When it comes to choosing a high-speed network for your data center, InfiniBand (IB) and RoCE are two of the most popular options. However, InfiniBand has several advantages that make it the superior choice:

Low Latency

InfiniBand has a dedicated end-to-end network, with each IB switch having a latency of approximately 150-200ns, which is significantly lower than traditional Ethernet switches (500ns or more). This latency advantage becomes even greater in large networks with multiple hops.

Original Lossless Network

Both IB and RoCE are extensions of RDMA applications, but InfiniBand’s point-to-point Credit CC congestion control makes it easy to achieve a lossless network. In contrast, RoCE’s end-to-end flow control and congestion control significantly increase network complexity and maintenance costs.

Hardware Acceleration

IB switches and network cards have unique optimization technologies such as SHARP, Adaptive Routing, and Shield, which give them a significant advantage over Ethernet switches in HPC and AI scenarios, especially for large-scale deployments.

More Flexible Physical Topology

An independent IB subnet can directly support 40,000 nodes without broadcast storms and can support physical topologies such as Fat Tree, DragonFly, DragonFly+, and Torus3D. Even Google’s Aquila network, released in 2022, uses DragonFly’s networking approach.Fat Tree can form a highly efficient and easy-to-expand 2-layer-3-hop or 3-layer-5-hop super topology, with no limit on the scale of network clusters. Therefore, it remains the most widely used topology for both IB and Ethernet networks.

The Pros and Cons of InfiniBand for Your Data Center

While InfiniBand (IB) offers several advantages over other high-speed networks, it’s important to consider the potential drawbacks as well. Here are some of the cons of using InfiniBand:

Limited Availability

Currently, NVIDIA is the only vendor that provides end-to-end IB products.

High Management Costs

Many users have limited knowledge of IB network technology, and configuring and monitoring IB switches and network cards typically requires command-line interfaces. However, the latest UFM network management software is gradually addressing these issues and providing third-party API interfaces for monitoring software such as Prometheus and Grafana.

Limited Use Cases in Public Cloud

There are few practical applications of IB networks in multi-tenant and virtualized environments in public clouds.
Despite these challenges, InfiniBand remains a viable option for GPU small clusters, with both IB and RoCE networks having their own advantages and disadvantages. Depending on your company’s budget and technical expertise, you can choose the best option for your needs. For large-scale clusters with more than 10,000 cards, IB networks are a more reliable choice, while RoCE networks require relatively more investment in human and material resources.

InfiniBand is an excellent option for companies looking for high-speed networks with low latency and high reliability. However, it’s important to weigh the pros and cons carefully and choose the best option for your specific needs. With theright network in place, you can achieve faster and more efficient communication between nodes, which is essential for large-scale

What is UFM (Unified Fabric Manager)?

Mellanox Nvidia UFM is a powerful platform for managing extended Ethernet and InfiniBand computing environments created by Mellanox. With UFM, data center operators can equip, monitor, and operate their data center links more effectively, ultimately improving application performance and ensuring that links are always operational.

UFM utilizes an innovative “application-centric” approach as a bridge between servers, applications, and link elements. Its innovative link module allows users to manage link architecture as a set of related entities, much like managing real-time applications and services. UFM’s management structure enables link monitoring and performance optimization at the application logic level (rather than at the level of individual ports or devices).

UFM includes an advanced, granular monitoring engine that provides real-time access to data within the link structure. This software also includes a unique blocking tracking feature that quickly identifies bottlenecks in transmission and blocking events that occur in the link architecture, allowing for more accurate problem identification and faster resolution.

UFM can identify virtual servers and treat them similarly to physical servers. Its link policy engine makes link and policy strategies available for virtual servers as well. Even during VM migration, UFM can maintain link and policy integrity, providing a precise migration environment for applications.

UFM considers link topology and the characteristics of active applications, providing optimized routing algorithms for HPC development. Additionally, UFM divides links into multiple independent parts to increase the security and predictability of transmission between applications.

Mellanox’s management solution allows users to manage links ranging from small to large and monitor link performance at the application logic level. UFM offers increased visibility into link performance and potential bottlenecks, performance optimization through an application-centric approach, advanced time management for faster fault detection and repair, efficient management of dynamic environments, and more efficient use of link architecture resources.

UFM platformimage resource: Nvidia

Composition of UFM platform

The UFM platform is comprised of multiple solution levels and comprehensive feature sets, making it the ideal solution for even the most demanding modern horizontally scaled data center requirements. To further enhance the UFM product portfolio, NVIDIA has introduced the UFM Telemetry platform. UFM Telemetry provides network validation tools and monitors network performance and conditions. It captures rich real-time network telemetry information, application workload usage, system configurations, and streams it to on-premises or cloud-based databases for further analysis.

UFM Telemetry: Real-Time Monitoring

The UFM Telemetry platform provides network validation tools, monitors network performance and conditions, and captures rich real-time network telemetry information, application workload usage, and system configurations, streaming it to on-premises or cloud-based databases for further analysis. The platform can be deployed as a software container or on a dedicated device.

Platform: Software containers or dedicated devices.
Key features include:
● Switch, adapter, and cable telemetry
● System validation
● Network performance testing
● Streaming telemetry information to user-built or cloud-based databases

UFM Enterprise:Network Visualization and Control

UFM Enterprise is the latest addition to the Mellanox UFM platform, combining the power of UFM Telemetry with advanced network monitoring and management capabilities. The platform offers a range of features that enable automated network discovery and provisioning, traffic monitoring and congestion detection, job scheduling and deployment, secure cable management, and integration with leading job schedulers and cloud and cluster managers such as Slurm and Platform Load Sharing Facility (LSF).
Platform: Software container or dedicated device
Key Features:
● Includes UFM Telemetry functionality
● Automated network discovery and validation
● Secure cable management
● Congestion tracking to diagnose traffic bottlenecks
● Problem identification and resolution
● Global software updates
● Integration with Slurm and Platform LSF and support for job scheduler deployment
● Advanced reporting and rich REST API
● Web-based GUI

UFM Cyber-AI:Revolutionizing Network Intelligence and Analysis

UFM Cyber-AI is a powerful platform that enhances the advantages of UFM Telemetry and UFM Enterprise, providing predictive maintenance and network security to reduce supercomputing operating expenses.
Platform: Local dedicated UFM Cyber-AI device
Key Features:
● Includes UFM Telemetry and UFM Enterprise functionality
● Analyzes performance degradation or application mode features over time
● Detects anomalous cluster behavior
● Establishes correlations between seemingly unrelated phenomena using AI
● Reports predictive maintenance alerts
● Optimizes predictability through continuous system data collection

Integrating with existing data center management tools is easy with UFM. UFM provides an open and scalable object model to describe data center infrastructure and perform all related management operations. UFM’s REST API supports integration with leading job schedulers, cloud and cluster managers (including Slurm and Platform LSF), and offers network provisioning and integration with OpenStack, Azure Cloud, and VMware.

NVIDIA Mellanox UFM platform product suite can fundamentally transform your supercomputing data center network management, saving operational costs and maintaining customer satisfaction.By combining enhanced real-time network telemetry with AI-based network intelligence and analysis capabilities to support horizontally scalable InfiniBand data centers and NVIDIA Mellanox Care services.
computing, HPC, and AI applications.

Sum up

Ethernet and InfiniBand are two different interconnection technologies with distinctive features, each with its own unique strengths, which are constantly evolving in their respective application areas to enhance the performance of the Internet and optimize the interconnection experience.

InfiniBand, RoCE, and Ethernet are all powerful networking technologies with their own unique advantages.With the help of Mellanox’s Unified Fabric Manager (UFM), manag ing and optimizing these networks becomes easier than ever before.

If you would like to learn more about UFM, please visit the Naddod product website for information on Cluster Management - UFM Test Installation Guide.

Related Resources:
What Is InfiniBand and How Is It Different from Ethernet?
NADDOD High-Performance Computing (HPC) Solution
Why Is InfiniBand Used in HPC?
InfiniBand Network Technology for HPC and AI: In-Network Computing
What Is InfiniBand Network and Its Architecture?