What is InfiniBand?
Since the beginning of the 21st century, with the increasing popularity of cloud computing and big data, data centers have developed rapidly. InfiniBand is a key technology in the data center and occupies a very important position. In particular, starting from 2023, the rise of large AI models represented by ChatGPT has further increased the focus on InfiniBand. This is because the network used by gpt is built on NVIDIA's InfiniBand.
So what exactly is InfiniBand technology? Why is it so popular? What is the oft-discussed "InfiniBand vs. Ethernet" debate? This article will answer each of these questions.
InfiniBand (IB for short) is a powerful communication protocol. To tell the story of its birth, we need to start with the architecture of computers.
As we all know, modern digital computers have been using von Neumann architecture since their inception. In this architecture, there are CPUs (arithmetic logic unit and control unit), memory (RAM, hard disk), and I/O (input/output) devices.
In the early 1990s, in order to support more and more external devices, Intel was the first to introduce the Peripheral Component Interconnect (PCI) bus design into the standard PC architecture.
Shortly thereafter, the Internet entered a phase of rapid development. The continuous growth of online businesses and user base posed significant challenges to the capacity of IT systems.
At that time, with the support of Moore's Law, components such as CPUs, memory, and hard drives were rapidly advancing. However, the PCI bus was upgrading at a slower pace, greatly limiting I/O performance and becoming a bottleneck for the entire system.
To address this issue, Intel, Microsoft, and SUN led the development of the "Next Generation I/O (NGIO)" technology standard. IBM, Compaq, and Hewlett-Packard, on the other hand, led the development of "Future I/O (FIO)." These three companies, together, also created the PCI-X standard in 1998.
In 1999, the FIO Developers Forum and NGIO Forum merged to establish the InfiniBand Trade Association (IBTA).
Soon, in the year 2000, the 1.0 version of the InfiniBand Architecture Specification was officially released.
Simply put, the birth purpose of InfiniBand was to replace the PCI bus. It introduced the RDMA protocol, offering lower latency, higher bandwidth, and greater reliability, thereby enabling more powerful I/O performance.
When it comes to InfiniBand, one company that must be mentioned is the renowned Mellanox.
In May 1999, several employees who had resigned from Intel and Galileo Technology founded a chip company in Israel, naming it Mellanox.
After its establishment, Mellanox joined NGIO. Later, NGIO and FIO merged, and Mellanox subsequently became part of the InfiniBand camp. In 2001, they introduced their first InfiniBand product.
In 2002, a significant change occurred within the InfiniBand camp.
That year, Intel made a sudden decision to shift its focus to developing PCI Express (PCIe), which was launched in 2004. Another major player, Microsoft, also withdrew from InfiniBand development.
Although companies like SUN and Hitachi chose to persist, the development of InfiniBand was cast with shadows.
Starting in 2003, InfiniBand shifted towards a new application domain, which was computer cluster interconnectivity.
During that year, Virginia Tech created a cluster based on InfiniBand technology, ranking third in the TOP500 list (a global ranking of supercomputers).
In 2004, another significant InfiniBand non-profit organization was established—the Open Fabrics Alliance (OFA).
OFA and IBTA have a collaborative relationship. IBTA is primarily responsible for the development, maintenance, and enhancement of the InfiniBand protocol standards, while OFA is responsible for developing and maintaining the InfiniBand protocol and higher-level application APIs.
In 2005, InfiniBand found another new application scenario—the connection of storage devices.
During that time, InfiniBand and Fibre Channel (FC) were popular SAN (Storage Area Network) technologies. It was at this time that many people became aware of InfiniBand technology.
Subsequently, InfiniBand technology gradually gained popularity, attracting an increasing number of users, and its market share continued to rise.
By 2009, there were already 181 systems utilizing InfiniBand technology in the TOP500 list. (Of course, Gigabit Ethernet was still the mainstream with 259 systems.)
As InfiniBand started to rise, Mellanox also grew continuously, gradually becoming a leader in the InfiniBand market.
In 2010, Mellanox merged with Voltaire, leaving Mellanox and QLogic as the primary InfiniBand suppliers. Soon after, in 2012, Intel invested in acquiring QLogic's InfiniBand technology, reentering the competition in the InfiniBand market.
After 2012, with the continuous growth of high-performance computing (HPC) demands, InfiniBand technology continued to make significant progress, increasing its market share.
In 2015, InfiniBand technology's share in the TOP500 list exceeded 50% for the first time, reaching 51.4% (257 systems).
This marked the first time InfiniBand technology had successfully challenged Ethernet technology. InfiniBand became the preferred internal interconnect technology for supercomputers.
In 2013, Mellanox made further advancements by acquiring silicon photonics technology company Kotura and parallel optical interconnect chip manufacturer IPtronics, further strengthening its industry presence. By 2015, Mellanox had captured an 80% share of the global InfiniBand market. Their business scope expanded from chips to encompass network cards, switches/gateways, remote communication systems, cables, and modules, establishing themselves as a world-class networking provider.
In the face of InfiniBand's progress, Ethernet did not remain idle.
In April 2010, the IBTA introduced RoCE (RDMA over Converged Ethernet), which "ported" the RDMA technology from InfiniBand to Ethernet. In 2014, they proposed a more mature version, RoCE v2.
With RoCE v2, Ethernet significantly narrowed the technological performance gap with InfiniBand. Combined with its inherent cost and compatibility advantages, Ethernet began to make a comeback.
The chart below shows the technology shares in the TOP500 list from 2007 to 2021.
As shown in the graph, starting from 2015, 25G and higher-speed Ethernet (represented by the dark green line) began to rise and quickly became the industry favorite, temporarily suppressing InfiniBand.
In 2019, Nvidia, the company, made a bold move by acquiring Mellanox for a staggering $6.9 billion, surpassing rival offers from Intel and Microsoft, who bid $6 billion and $5.5 billion, respectively. This successful acquisition further strengthened Nvidia's position in the high-performance computing and data center domains, establishing them as another significant player in the networking technology market, following InfiniBand.
According to Nvidia CEO Jensen Huang, the reason for the acquisition was explained as follows: "This is the combination of two leading global high-performance computing companies. We focus on accelerated computing, while Mellanox specializes in interconnect and storage."
In hindsight, his decision appears to have been very visionary.
As we can see, with the rise of large AI language models like GPT-3, there has been an exponential surge in the demand for high-performance computing and intelligent computing across society.
To support such a massive computational demand, high-performance computing clusters are essential. In terms of performance, InfiniBand is considered the top choice for high-performance computing clusters.
By combining their own GPU computing power with Mellanox's networking expertise, Nvidia has effectively created a powerful "computing engine." In terms of computational infrastructure, Nvidia undoubtedly holds a leading advantage.
Today, the competition in high-performance networking is between InfiniBand and high-speed Ethernet. Both sides are evenly matched. Manufacturers with abundant resources are more likely to choose InfiniBand, while those seeking cost-effectiveness may lean towards high-speed Ethernet.
There are also other technologies remaining, such as IBM's BlueGene, Cray, and Intel's OmniPath, which generally belong to the second tier of options.
The Technical Principles of InfiniBand
After introducing the development history of InfiniBand, let's now take a look at its working principle and why it is superior to traditional Ethernet. How does it achieve low latency and high performance?
As mentioned earlier, one of the most prominent advantages of InfiniBand is its early adoption of the Remote Direct Memory Access (RDMA) protocol.
In traditional TCP/IP, data from the network card is first copied to the main memory and then further copied to the application's storage space. Similarly, data from the application space is copied to the main memory before being sent out through the network card to the Internet.
This I/O operation requires intermediate copying in the main memory, which increases the length of the data transfer path, adds burden to the CPU, and introduces transmission latency.
RDMA can be seen as a technology that "eliminates intermediaries."
With RDMA's kernel bypass mechanism, it enables direct data reads and writes between applications and the network card, reducing data transmission latency within servers to nearly 1 microsecond.
Furthermore, RDMA's zero-copy mechanism allows the receiving end to directly read data from the sender's memory, bypassing the involvement of the main memory. This greatly reduces CPU burden and improves CPU efficiency.
As mentioned earlier, the rapid rise of InfiniBand can be attributed to the significant contributions of RDMA.
InfiniBand Network Architecture
The network topology structure of InfiniBand is illustrated in the diagram below:
InfiniBand is a channel-based architecture, and its components can be mainly categorized into four types:
· HCA (Host Channel Adapter)
· TCA (Target Channel Adapter)
· InfiniBand links (connecting channels, which can be cables or fibers, or even on-board links)
· InfiniBand switches and routers (used for networking)
Channel adapters are used to establish InfiniBand channels. All transmissions start or end with a channel adapter to ensure security or operate at a given Quality of Service (QoS) level.
Systems using InfiniBand can be composed of multiple subnets, with each subnet capable of accommodating over 60,000 nodes. Within a subnet, InfiniBand switches perform layer 2 processing. Between subnets, routers or bridges are used for connectivity.
The second-layer processing in InfiniBand is straightforward. Each InfiniBand subnet has a subnet manager that generates a 16-bit Local Identifier (LID). InfiniBand switches consist of multiple InfiniBand ports and forward data packets from one port to another based on the LID included in the Layer 2 Local Routing Header. Apart from managing packets, switches do not consume or generate data packets.
With its simple processing and proprietary Cut-Through technology, InfiniBand significantly reduces forwarding latency to below 100 ns, which is notably faster than traditional Ethernet switches.
In the InfiniBand network, data is also transmitted in the form of packets, with a maximum size of 4 KB, using a serial approach.
InfiniBand Protocol Stack
The physical layer defines how bit signals are composed into symbols on the wire, and further into frames, data symbols, and data padding between packets. It provides detailed specifications for signaling protocols to construct efficient packets.
The link layer defines the format of data packets and protocols for packet operations such as flow control, routing selection, encoding, and decoding.
The network layer performs routing selection by adding a 40-byte Global Route Header (GRH) to the data packet, enabling data forwarding.
During the forwarding process, routers perform only variable CRC checks, ensuring end-to-end data transmission integrity.
The transport layer further delivers the data packet to a designated Queue Pair (QP) and instructs the QP on how to process the packet.
It can be observed that InfiniBand has its own defined layers 1-4, making it a complete network protocol. End-to-end flow control forms the foundation of InfiniBand network packet transmission and reception, enabling lossless networks.
Speaking of Queue Pairs (QPs), a few more points are worth mentioning. They are the fundamental communication units in RDMA technology.
A Queue Pair consists of two queues: the Send Queue (SQ) and the Receive Queue (RQ). When a user invokes API calls to send or receive data, they are actually placing the data into the QP. The requests in the QP are then processed one by one using a polling mechanism.
InfiniBand Link Rate
InfiniBand links can be established using either copper cables or fiber optic cables. Depending on the specific connection requirements, dedicated InfiniBand cables are used.
InfiniBand defines multiple link speeds at the physical layer, such as 1X, 4X, and 12X. Each individual link is a four-wire serial differential connection, with two wires in each direction.
Taking the example of the early SDR (Single Data Rate) specification, the original signal bandwidth for a 1X link was 2.5 Gbps, while a 4X link had a bandwidth of 10 Gbps, and a 12X link had a bandwidth of 30 Gbps.
The actual data bandwidth for a 1X link was 2.0 Gbps due to the use of 8b/10b encoding. Since the link is bidirectional, the total bandwidth relative to the bus is 4 Gbps.
Over time, InfiniBand's network bandwidth has continuously upgraded, progressing from the early SDR, DDR, QDR, FDR, EDR, and HDR to NDR, XDR, and GDR, as shown in the diagram below:
The Commercial Products of InfiniBand
Finally, let's take a look at the commercial InfiniBand products available on the market.
After NVIDIA's acquisition of Mellanox, they introduced their own seventh-generation NVIDIA InfiniBand architecture platform called NVIDIA Quantum-2 in 2021.
The NVIDIA Quantum-2 platform includes the following components: NVIDIA Quantum-2 series switches, NVIDIA ConnectX-7 InfiniBand adapters, BlueField-3 InfiniBand DPU, and related software.
The NVIDIA Quantum-2 series switches are designed in a compact 1U form factor and are available in both air-cooled and liquid-cooled versions. The switches are built using a 7nm chip fabrication process and a single chip contains 57 billion transistors (even more than the A100 GPU). They offer flexible configurations with 64 400Gbps ports or 128 200Gbps ports, providing a total bidirectional throughput of 51.2Tbps.
The NVIDIA ConnectX-7 InfiniBand adapters support PCIe Gen4 and Gen5 and come in various form factors, offering single or dual network ports with a throughput of 400Gbps.
When connecting switches, network cards, and adapters in an IB network, choosing high-quality InfiniBand cables is essential for ensuring smooth network connectivity. Currently, the original InfiniBand HDR, NDR, and EDR AOC/DAC cables from NVIDIA are expensive and in short supply, which poses an obstacle for companies looking to quickly deploy high-performance computing networks.
As a leading provider of overall optical network solutions, NADDOD offers lossless network solutions based on InfiniBand and RoCE (RDMA over Converged Ethernet) to build lossless network environments and high-performance computing capabilities for users. NADDOD can choose the optimal solution tailored to specific situations and user requirements, providing high bandwidth, low latency, and high-performance data transmission to effectively address network bottleneck issues and enhance network performance and user experience.
NADDOD manufactures InfiniBand AOC/DAC cables that meet the connectivity requirements from 0.5m to 100m distances, supporting various rates including NDR, HDR, EDR, FRD, and more. Additionally, they offer fast delivery, free sample trials, lifetime warranty, and technical support. With their excellent customer service and products, NADDOD provides superior performance while reducing costs and complexity, catering to server clusters' high-performance needs.
With a professional technical team and extensive experience in implementing and servicing various application scenarios, NADDOD's products and solutions have gained trust and popularity among customers, widely applied in industries and critical fields such as high-performance computing, data centers, education and research, biomedicine, finance, energy, autonomous driving, internet, manufacturing, and telecommunications.
In conclusion, the future of InfiniBand looks promising, driven by high-performance computing and artificial intelligence.
InfiniBand, as a high-performance and low-latency interconnect technology, has been widely adopted in large-scale computing clusters and supercomputers. It provides higher bandwidth and lower latency to meet the demands of large-scale data transfers and high-concurrency computing. InfiniBand also supports more flexible topologies and complex communication patterns, giving it a unique advantage in high-performance computing and AI domains.
However, Ethernet, as a widely adopted networking technology, is also evolving. With increasing Ethernet speeds and technological advancements, it has solidified its position in data centers and has caught up with InfiniBand in certain aspects. Ethernet has a broad ecosystem and mature standardization support, making it easier to deploy and manage in general data center environments.
As technology continues to evolve and demands change, both InfiniBand and Ethernet may play to their respective strengths in different application scenarios. Whether InfiniBand or Ethernet will have the last laugh, only time will tell. They will continue to drive the development and innovation of information technology, meeting the growing bandwidth demands and providing efficient data transmission and processing capabilities.