What Is InfiniBand and HDR and Why Is IB Used for Supercomputers? - NADDOD Blog

What Is InfiniBand and HDR and Why Is IB Used for Supercomputers?

NADDOD Abel InfiniBand Expert Dec 27, 2022

What Is InfiniBand Network?

InfiniBand (abbreviated as IB) is a computer network communication standard defined by the IBTA (InfiniBand Trade Association). InfiniBand technology has been widely used in the field of high-performance computing (HPC) since it can provide extremely high throughput, high bandwidth and low latency for network transmission.

InfiniBand is used for data connection inside and outside the computing system. Through direct connection or interconnect via network switches, InfiniBand can provide high-performance networks for server-to-storage, and storage-to-storage data transmission. The InfiniBand network can be expanded horizontally through the switch network to meet the networking needs of various scales. With the rapid development of scientific computing, artificial intelligence (AI), and cloud data centers, InfiniBand is more and more preferred in HPC supercomputing applications to bring end-to-end high-performance networking.

InfiniBand Is Widely Used in Supercomputers and HPC Data Centers

Early in June 2015, InfiniBand consisted of 51.8% in the Top500 list of the world’s most powerful supercomputers, with a 15.8% year-on-year growth.

TOP500 Interconnect Trends June 2015
TOP500 Interconnect Trends, June 2015. Image source: infinibandta.org

In the Top500 list in June 2022 , InfiniBand networks once again topped the list of supercomputer interconnect devices with absolute numbers and performance advantages, a significant increase from the previous list. Throughout this list, the following three trends can be summarized.

  • Supercomputers based on InfiniBand networks are significantly ahead of other network technologies with 189 systems. InfiniBand-based supercomputers lead the Top 100 systems in particular, with 59 units in the Top 100, and InfiniBand networks have become the standard for performance-conscious supercomputers.
  • NVIDIA GPU and networking products, such as NVIDIA Volta GV100, A100, Mellanox HDR Quantum QM87xx switches and BlueField DPU, are the dominant interconnect in the Top500 systems, with more than two-thirds of the supercomputers using NVIDIA Mellanox networking, and the performance and technology leadership of NVIDIA Mellanox networking has been widely recognized.
  • It is also worth noting that InfiniBand networks are widely used not only in the traditional HPC business, but also in enterprise-class data centers and public clouds. NVIDIA Selene, the number one performance enterprise supercomputer, and Microsoft’s Azure public cloud are both leveraging InfiniBand networks to deliver superb business performance.

    Whether it is the evolution of data communication technology, the innovation of Internet technology, or the upgrade of visual presentation, all are thanks to more powerful computing, larger capacity and more secure storage, and more efficient network; InfiniBand network-based cluster architecture solution not only provides higher bandwidth network services, but also reduces the consumption of computing resources by network transmission load and reduces latency and perfectly integrates HPC with data centers.

    In the latest Top500 list in Nov. 2022, InfiniBand continued leadership in the Top 500 high-performance computing systems and kept growing. Why are InfiniBand networks so highly valued in the Top500? Its performance benefits play a decisive role. NADDOD summarizes the top 10 advantages of InfiniBand as follows.

    InfiniBand Advantages

Top 10 Advantaged of InfiniBand Network

As a future-proof standard of High-Performance Computing (HPC), InfiniBand technology is highly valued in the HPC connectivity for supercomputers, storage and even LAN networks. InfiniBand has absolute advantages in many ways, such as simplified management, high bandwidth, full CPU offload, ultra-low latency, cluster scalability and flexibility, QoS, SHARP support, etc. The following provides a full explanation of what InfiniBand is and the top 10 advantages of InfiniBand, and how it is different from Ethernet.

1. Simple Network Management
InfiniBand is the first network architecture that is truly designed natively for SDN and is managed by a subnet manager.

The subnet manager configures the local subnet and ensures continuous operation. All channel adapters and switches must implement an SMA that works with the subnet manager to handle the traffic. Each subnet must have at least one subnet manager for initial management and reconfiguration of the subnet when the link is connected or disconnected. An arbitration mechanism is used to select one subnet manager as the master subnet manager, while the other subnet managers work in standby mode (each subnet manager in standby mode backs up the topology information of this subnet and verifies that this subnet is operational). If the primary subnet manager fails, a standby subnet manager takes over the management of the subnet to ensure uninterrupted operation.

Simple Network Management

2. Higher Bandwidth
Since the birth of InfiniBand, the development of InfiniBand network data rate has been faster than Ethernet for a long time, mainly because InfiniBand is used for interconnection between servers in high-performance computing, which requires higher bandwidth. Early in 2014, the dominant InfiniBand speed was 40Gb/s QDR and 56Gb/s FDR. Now higher InfiniBand speeds of 100Gb/s EDR and 200Gb/s HDR have been adopted in many supercomputers worldwide. And with the launch of the latest OpenAI tool ChatGPT, many businesses start to consider deploying the newest InfiniBand networking products of 400Gb/s NDR data rate, such as Infiniband NDR switches and optical connectivity cables, in their HPC systems.

InfiniBand speed

The abbreviations for each InfiniBand speed type are as follows:

  • SDR - Single Data Rate, 8Gbps.
  • DDR - Double Data Rate, 10Gbps/16Gbps.
  • QDR - Quad Data Rate, 40Gbps/32Gbps.
  • FDR - Fourteen Data Rate, 56Gbps .
  • EDR - Enhanced Data Rate, 100Gbps.
  • HDR - High Dynamic Range, 200Gbps.
  • NDR - Next Data Rate, 400Gbps.
  • XDR - eXtreme Data Rate, 800Gbps.

3. CPU Offload
A key technology for accelerated computing is CPU offload, and the InfiniBand network architecture allows data to be transferred with minimal CPU resources, which is accomplished by:

  • Hardware offload of the entire transport layer protocol stack.
  • Bypass kernel, zero copy.
  • RDMA (remote direct memory access), which writes data from one server’s memory directly to another’s memory without CPU involvement.

Support a variety of network topologies1

It is also possible to use GPU Direct technology, which can directly access data in GPU memory and transfer data from GPU memory to other nodes. This can accelerate computational applications such as artificial intelligence (AI), Deep Learning training, Machine Learning, etc.

CPU offload

4. Lower Latency
The latency difference between InfiniBand and Ethernet will be divided into two main parts. The first is at the switch level. As a layer 2 device in the network transport model, Ethernet switches generally use MAC table lookup addressing and store-and-forward (some products have borrowed InfiniBand’s Cut-though technology). Due to the need to consider complex services such as IP, MPLS, QinQ and other processing, resulting in a long Ethernet switch processing process, generally the latency is measured in a number of us (cut-through support will be in more than 200ns). In contrast, the layer 2 processing of InfiniBand switches is very simple. The forwarding path information can be found only based on the 16-bit LID. At the same time, the Cut-Through technology is adopted to greatly shorten the forwarding delay to less than 100ns, which is much faster than Ethernet switches.

At the network interface card (NIC) level, as mentioned earlier, with RDMA technology, NICs do not need to go through the CPU to forward messages, which greatly accelerates the delay of message processing in encapsulation and decapsulation, and the general InfiniBand NIC send and receive delay (write, send) is 600ns, while the send and receive delay of Ethernet-based TCP UDP applications based on Ethernet will have a send/receive delay of about 10us, a difference of more than ten times between Infinband latency and Ethernet latency.

Low Latency

5. Scalability and Flexibility
A major advantage of the InfiniBand network is that a single subnet can deploy a 48,000 nodes to form a huge Layer 2 network. Moreover, InfiniBand networks do not rely on broadcast mechanisms such as ARP and do not generate broadcast storms or additional bandwidth waste.

Multiple InfiniBand subnets can also be connected via routers and switches.

InfiniBand technology supports multiple network topologies.

Scalability and flexibility
When the scale is small, it is recommended to use 2-layer fat-tree topology. A larger scale can use 3-layer fat-tree network topology. Above a certain scale, Dragonfly+ topology can be used to save some costs.

Scalability and flexibility1

6. InfiniBand Network Provides QoS Support
How does an InfiniBand network provide QoS support if several different applications are running on the same subnet and some of them need higher priority than others?

QoS is the ability to provide different priority services for different applications, users or data flows. High-priority applications can be mapped to different port queues, and messages in the queue can be sent first.

InfiniBand implements QoS using Virtual Lanes (VLs), which are discrete logical communication links that share a physical link, each of which can support up to 15 standard virtual lanes and one management channel (VL15).

QoS

7. Stability and Resilience
Ideally, the network is very stable and free of failures. But long-running networks inevitably experience some failures. How does InfiniBand handle these failures and recover quickly?

NVIDIA Mellanox InfiniBand solutions , which include hardware like InfiniBand switches, NICs and Mellanox cables , provide a mechanism called Self-Healing Networking, a hardware capability that is based on InfiniBand switches. Self-Healing Networking allows link failures to be recovered in just 1 millisecond, which is 5000x faster than normal recovery times.

Network stability and resilience

8. Optimized Load Balancing
A very important requirement inside a high-performance data center is how to improve the utilization of the network. In the InfiniBand network, one way is using load balancing.

Load balancing is a routing strategy that allows traffic to be sent over multiple available ports.

Adaptive Routing is one such feature that allows traffic to be distributed evenly across switch ports. AR is supported in hardware on the switch and is managed by Adaptive Routing Manager.

When AR is on, Queue Manager on the switch monitors traffic on all GROUP EXIT ports, equalizes the load on each queue, and directs traffic to underutilized ports. AR supports dynamic load balancing to avoid network congestion and maximize network bandwidth utilization.

9. In-Network Computing Technology - SHARP
InfiniBand switches also support the network computing technology, SHARP - Scalable Hierarchical Aggregation and Reduction Protocol.

SHARP is software that is based on switch hardware and is a centrally managed software package.

SHARP can offload aggregate communication that was running on CPUs and GPUs to the switch, optimizing aggregate communication, avoiding multiple data transfers between nodes, and reducing the amount of data that needs to be transferred over the network. Therefore, SHARP can greatly improve the performance of accelerated computing, based on MPI applications such as AI, machine learning, etc.

Network Computing - SHARP

10. Network Topologies
As mentioned above, InfiniBand networks can support a very large number of topologies, such as:

  • Fat Tree
  • Torus
  • Dragonfly+
  • Hypercube
  • HyperX

By supporting different network topo, InfiniBand meets different needs, such as:

  • Easy network scaling
  • Reduced TCO (total cost of owning)
  • Maximizing blocking ratio
  • Minimizing latency
  • Maximizing transmission distance
Support a variety of network topologies

InfiniBand, with its unparalleled technical advantages, greatly simplifies high-performance network architecture and reduces latency caused by multi-level architectural hierarchies, providing strong support for the smooth upgrade of access bandwidth for critical computing nodes. The trend for InfiniBand networks is to enter more and more usage scenarios, because of its high bandwidth and low latency, and its compatibility with the Ethernet.

InfiniBand HDR Product Solution Introduction

With the growing needs on the client side, 100Gb/s EDR is gradually withdrawing from the market. While currently the data rate of NDR is too high, and only leading customers are using it. HDR has been widely used due to the flexibility of HDR100 100G and HDR 200G.

InfiniBand HDR Switch

There are two types of NVIDIA InfiniBand HDR switches. One is the HDR CS8500 modular chassis switch, a 29U switch providing up to 800 HDR 200Gb/s ports. Each 200G port supports splitting into 2X100G, which can support up to 1600 HDR100 100Gb/s ports. The other is the QM87xx series fixed switch. The 1U panel integrates 40 200G QSFP56 ports, which can be split into up to 80 HDR 100G ports to connect to 100G HDR network cards. At the same time, each port also backward supports the EDR rate to connect a 100G EDR NIC card. It should be noted that one 200G HDR port can only be speed-reduced to 100G to connect with the EDR network card, and cannot be split into 2X100G to connect two EDR network cards.

QM8700 switch
QM8700 front panel

There are two types of 200G HDR QM87xx switches: QM8700 and QM8790. The only difference between the two models is the management method. The QM8700 has a management port to support out-of-band management, while the QM8790 requires the NVIDIA Unified Fabric Manager (UFM®) platform for management.
QM8790 switch
QM8790 front panel

For QM8700 and QM8790, each type of switch has two options for airflow, among which MQM8790-HS2F is P2C (Power to Cable) airflow with a blue mark on the fan module. If you cannot remember the color mark, you can also put your hand in front of the air inlet and outlet of the switch to identify it. MQM8790-HS2R is C2P (Cable to Power) airflow with a red mark on the fan module. QM87xx series switch models are as follows:

Switch Model Interface Type Data Rate(Port Number) Rack Height Management
200G 100G
 

[MQM8700-HS2F]

40xQSFP56 200Gb/s40801Uinternally managed/managed

[MQM8700-HS2R]

40xQSFP56 200Gb/s40801UMQM8790-HS2F40xQSFP56 200Gb/s40801Uexternally managed/unmanagedMQM8790-HS2R40xQSFP56 200Gb/s40801U

Related Post:
[NVIDIA QM87xx Series HDR Switches Comparison: QM8700 vs QM8790]
NVIDIA QM8700 & QM8790 HDR Switch - Transceiver and Cable Support Matrix

There are generally two common connectivity applications for QM8700 and QM8790 switches. One is to interconnect with the 200G HDR network card, which can be directly connected using 200G to 200G AOC/DAC cables . The other common application is to interconnect with 100G HDR network cards, which needs to use the 200G to 2X100G cable is used to split a physical 200G (4X50G) QSFP56 port of the switch into two virtual 100G (2X50G) ports. After the split, the symbol of the port is changed from x/y to x/Y/z, where “x/Y” indicates the previous symbol of the port before the split, and “z” indicates the number (1,2) of the single-lane port, and each sub-physical port is considered as a single port.
Typical topology of HDR two-layer fat tree
Typical topology of HDR two-layer fat tree

InfiniBand HDR NIC Cards

Compared with HDR switches, there are many types of HDR network interface cards (NICs). In terms of speed, there are two options: HDR100 and HDR.

The HDR100 NIC card supports a transmission rate of 100Gb/s, and two HDR100 ports can be connected to the HDR switch through a [200G HDR to 2X100G HDR100 cable]. Unlike the 100G EDR network adapter, the 100G port of the HDR100 NIC card supports both 4X25G NRZ transmission and 2X50G PAM4 transmission.

The 200G HDR network card supports a transmission rate of 200G, and can be directly connected to the switch using a 200G direct cable.

In addition to two interface data rates, the network card of each rate can choose single-port, dual-port and PCIe type according to business needs. Commonly used IB HDR network card models are as follows:

NIC Model Data Rates Interface Host Interface [PCIe]
MCX653105A-ECAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
single QSFP56 PCIe 3.0/4.0 x16
MCX651105A-EDAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
single QSFP56 PCIe 4.0 x8
MCX653106A-ECAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
dual QSFP56 PCIe 3.0/4.0 x16
MCX653105A-HDAT 200,100,50,40,25,10(Ethernet)
HDR, HDR100, EDR, FDR, QDR, DDR, SDR
single QSFP56 PCIe 3.0/4.0 x16
MCX653106A-HDAT 200,100,50,40,25,10(Ethernet)
HDR, HDR100, EDR, FDR, QDR, DDR, SDR
dual QSFP56 PCIe 3.0/4.0 x16
MCX653105A-EFAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
single QSFP56 PCIe 3.0/4.0 x16 Socket Direct 2x8 in a row
MCX653106A-EFAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
dual QSFP56 PCIe 3.0/4.0 x16 Socket Direct 2x8 in a row
MCX654105A-HCAT 200,100,50,40,25,10(Ethernet)
HDR, HDR100, EDR, FDR, QDR, DDR, SDR
single QSFP56 Socket Direct 2x PCIe 3.0 x16
MCX654106A-HCAT 200,100,50,40,25,10(Ethernet)
HDR, HDR100, EDR, FDR, QDR, DDR, SDR
dual QSFP56 Socket Direct 2x PCIe 3.0 x16
MCX654106A-ECAT 100/50/40/25/10G(Ethernet)
HDR100, EDR, FDR, QDR, DDR, SDR
dual QSFP56 Socket Direct 2x PCIe 3.0 x16

The HDR InfiniBand network architecture is simple, but there are various hardware options. The 100Gb/s rate has both 100G EDR and 100G HDR100 solutions; the 200Gb/s rate also has HDR and 200G NDR200. There is a big difference in switches, network cards and accessories used in different applications. InfiniBand high-performance HDR and EDR switches, SmartNIC cards, and NADDOD/Mellanox/Cisco/HPE AOC&DAC&optical module product portfolio solutions, providing more advantageous and valuable optical network products and total solutions for data center, high-performance computing, edge computing, artificial intelligence and other application scenarios, significantly improving customers’ business acceleration capabilities with low cost and excellent performance.

NADDOD InfiniBand Products

FAQs on InfiniBand vs Ethernet, Fibre Channel, and Omni-Path

What Is the Difference Between InfiniBand vs Ethernet?

InfiniBand and Ethernet are two important communication technologies used for data transfer, but they have different applications and benefits.

As mentioned at the beginning, InfiniBand is a high-speed, low-latency data transfer technology designed for high-performance computing (HPC) and AI. Its starting speed has been Infiniband SDR 10Gb/s, which was much faster than Gigabit Ethernet. And the dominant InfiniBand network speed has already been 100G EDR or 200G HDR, and the faster 400G NDR and 800G XDR are also trending. And the latency requirement of InfiniBand is as strict as near zero.

InfiniBand can be used for direct connection between two computers and allows for faster data transfer rates than Ethernet (the speed comparison can be read in the later part “2. InfiniBand vs Ethernet: Higher Bandwidth”), which makes it ideal for applications that require fast and precise data processing and transfer in supercomputing, such as large-volume data analysis, machine learning, deep learning training, deep learning inference, conversational AI, prediction and forecasting, etc.

Ethernet, on the other hand, is utilized to link computers to the internet and one another. Although it moves more slowly than InfiniBand, it is more reliable and used more frequently. This makes it perfect for LAN network applications that demand reliable and consistent data transfer.

These two technologies’ speed and reliability are where they primarily diverge. In HPC networking, for applications that require rapid data transfer, InfiniBand is the best option because it is faster and has lower latency than Ethernet, making it dominate a large percentage in the server system interconnect and storage network interconnect. On the other hand, Ethernet is relatively slower but much more reliable, making it the preferred option for applications that demand reliable and consistent data transfer, manily in LAN networks.

What Is the Difference Between InfiniBand vs Fibre Channel?

Fibre Channel, mainly used in data center Storage Area Networks (SAN), is a storage networking technology used for high-speed data transfer between two or more servers or storage devices or client nodes. It is utilized to connect computers to storage units like solid-state drives and hard drives. Data transfer facilitated by this dedicated and secure channel technology can be quick and reliable. Fibre Channel is a reliable and expandable technology that is frequently used in business storage solutions.

From the definitions of InfiniBand and Fibre Channel, we can see the primary distinction between these two technologies are the type of data transfer that each technology typically facilitates.

In the comparison of these three technologies, we can see that Ethernet is a good choice for client-server connections in a LAN environment, while Fibre Channel is best for storage applications in a SAN. InfiniBand is a newer technology that is best for linking CPU-memory components to I/O in an IAN. A fabric-based IAN allows for clustering and connections to I/O controllers.

Technology Standards Body First Standard Primary Application
Ethernet IEEE 1999 Local Area Network (LAN)
Fibre Channel ANSI 1988 Storage Area Network (SAN)
InfiniBand Architecture InfiniBand Trade Association 2001 I/O Area Network (IAN)

What Is the Difference Between InfiniBand vs Omni-Path?

Although NVIDIA has launched the InfiniBand 400G NDR solution, some customers are still using the 100G EDR solution at present. For 100Gb/s speed high-performance data center networks, Omni-Path and InfiniBand are commonly used solutions. Although the rate is the same and the performance is similar, the network structure of Omni-path and InfiniBand networks is very different. Taking a 400-node cluster as an example, only 15 NVIDIA Quantum 8000 series switches and 200pcs 200G HDR to 2x 100G HDR200 active fiber splitter cables and 200pcs 200G HDR to 200G HDR active optical cables using InfiniBand network solution. And using Omni-path requires 24 switches and 876pcs 100G active optical cables (384 nodes). The InfiniBand EDR solution has great advantages in terms of equipment cost in the early stage and operation and maintenance cost in the later stage, and the overall power consumption is much lower than that of Omni-path, which is more environmentally friendly.

Related Resources:
NADDOD InfiniBand Cables & Transceivers Products
NADDOD High-Performance Computing (HPC) InfiniBand Network Solution
NADDOD Helped the National Supercomputing Center to Build a General-Purpose Test Platform for HPC
Why Choose NADDOD InfiniBand Solutions?
Why Is InfiniBand Used in HPC?
InfiniBand Network Technology for HPC and AI: In-Network Computing
InfiniBand Trend Review: Beyond Bandwidth and Latency
Why Autonomous Vehicles Are Using InfiniBand?
Active Optical Cable Jacket Explained: OFNR vs OFNP vs PVC vs LSZH?
What Is InfiniBand Network and Its Architecture?