Performance Enhancements of NVIDIA Mellanox NDR 400Gb/s InfiniBand

As CPU and communication processing speeds continue to accelerate, with 10Gbps and 100Gbps becoming increasingly common, traditional I/O standards and systems such as PCI, Ethernet, and Fibre Channel may struggle to keep up. Consequently, upgrading existing devices or products to high-speed communication systems has become a common challenge for IT professionals.

The emergence of the InfiniBand standard (IB) aims to address communication transmission bottlenecks in traditional I/O architectures like PCI. This standard employs a point-to-point architecture to enhance fault tolerance and scalability, achieving 10Gbps data transfer in hardware (with each independent link based on a four-lane 2.5Gbps bidirectional connection). It also utilizes virtual lanes (Virtual Lane) to implement QoS and ensures signal integrity through CRC technology. This article will introduce the InfiniBand technology standard and its key components.

InfiniBand architecture

InfiniBand adopts a two-queue program extraction technique that allows applications to directly transfer data from the adapter to application memory (known as Remote Direct Memory Access or RDMA), and vice versa. In TCP/IP protocol, data from the network card is first copied to the core memory and then copied to the application's storage space, or data is copied from the application space to the core memory and then sent to the Internet via the network card. This I/O operation requires involvement of the core memory, increasing the length of the data transfer path and significantly reducing the access speed of I/O, as well as increasing CPU burden. In contrast, InfiniBand employs Single Data Copy (SDP), which directly copies data from the network card to the user's application space, avoiding core memory involvement. This approach, known as zero-copy, can achieve the maximum throughput that the protocol can achieve when performing large-scale data processing.

The InfiniBand protocol adopts a layered structure, where each layer is independent and provides services to the layer above it. The physical layer defines how bit signals are formed into symbols on the line, and then how symbols are formed into frames, data symbols, and data padding between packets, etc. It provides detailed specifications for signaling protocols to construct efficient packets. The link layer defines the format of data packets and the protocols for packet operations, such as flow control, routing selection, encoding, decoding, etc. The network layer performs routing selection by adding a 40-byte global route header (GRH) to the data packet and forwards the data. In the forwarding process, the router only performs variable CRC checks, ensuring the integrity of end-to-end data transmission. The transport layer then delivers the data packet to a specified Queue Pair (QP), indicating how the QP should process the packet and segmenting and reassembling the data when the payload of the message exceeds the Maximum Transfer Unit (MTU) of the channel.

InfiniBand Basic Components

The InfiniBand network topology is shown above, and its constituent units can be divided into four categories:

HCA (Host Channel Adapter): It serves as a bridge between the memory controller and the TCA.

TCA (Target Channel Adapter): It packages and sends digital signals from I/O devices (such as network cards, SCSI controllers) to the HCA.

InfiniBand link: It is the fiber optic connection between the HCA and TCA. InfiniBand architecture allows hardware vendors to connect TCA and HCA with 1, 4, or 12 fiber optic cables.

Switches and routers: Both HCA and TCA are essentially host adapters, which are programmable DMA (Direct Memory Access) engines with certain protection functions.

InfiniBand Applications

In high-concurrency and high-performance computing applications where customers have high requirements for both bandwidth and latency, InfiniBand (IB) networking can be used: IB can be used for both the front-end and back-end networks, or 10Gb Ethernet for the front-end network and IB for the back-end network. Due to its high bandwidth, low latency, high reliability, and the ability to scale clusters infinitely, InfiniBand, with its RDMA technology and specialized protocol offload engines, can provide sufficient bandwidth and lower response latency for storage customers.

IB currently supports and plans for higher bandwidth modes in the future (using 4X mode as an example):

SRD (Single Data Rate): 8 Gb/s

DDR (Double Data Rate): 16 Gb/s

QDR (Quad Data Rate): 32 Gb/s

FDR (Fourteen Data Rate): 56 Gb/s

EDR (Enhanced Data Rate): 100 Gb/s

HDR (High Data Rate): 200 Gb/s

NDR (Next Data Rate): 400 Gb/s+

NVIDIA Mellanox NDR 400Gb/s InfiniBand improves several performance aspects

Increased Bandwidth:NDR InfiniBand offers a higher data transfer rate of 400 Gb/s, compared to previous generations of InfiniBand, enabling faster and more efficient communication between nodes in high-performance computing (HPC) clusters.

Lower Latency:The NDR InfiniBand architecture reduces communication latency, allowing for faster data transmission and improved responsiveness in HPC applications. This is especially crucial in scenarios that require real-time data processing or quick interactions between nodes.

Enhanced Scalability: With its higher bandwidth capabilities, NDR InfiniBand enables better scalability in large-scale HPC environments. It can handle increased data volumes and accommodate the growing demands of modern workloads, enabling efficient parallel processing and distributed computing across a cluster.

Improved Efficiency:The NDR InfiniBand architecture optimizes data transfer and communication protocols, reducing overhead and maximizing the utilization of network resources. This leads to improved overall system efficiency and higher application performance.

Advanced Features:NDR InfiniBand introduces new features and capabilities, such as enhanced error detection and correction mechanisms, improved congestion control algorithms, and support for advanced network topologies. These features contribute to better reliability, stability, and overall network performance.

Future-Proofing:By providing a higher data transfer rate, NDR InfiniBand offers a technology roadmap for future HPC requirements. It ensures that HPC clusters built on NDR InfiniBand can accommodate increasing data-intensive workloads and emerging technologies without requiring significant infrastructure upgrades.

Conclusion

NDR InfiniBand has attracted numerous partners to jointly build an ecosystem, including server manufacturers such as Atos, Dell Technologies, Fujitsu, Inspur, Lenovo, and SuperMicro, as well as storage manufacturers like DDN and IBM Storage. All these companies have started developing their next-generation products to support NDR InfiniBand. Top global users, including Microsoft Azure, Los Alamos National Laboratory, and Jülich Supercomputing Center in Europe, have expressed their anticipation to quickly adopt NDR InfiniBand in their businesses to enjoy its technological advantages.

Gilad Shainer, Senior Vice President of Networking at NVIDIA, stated, "The most important task for our AI customers is to process increasingly complex applications, which requires a faster, smarter, and more scalable network. NVIDIA Mellanox 400G InfiniBand, with its massive throughput and intelligent acceleration engines, helps HPC, AI, and hyperscale cloud infrastructures achieve unparalleled performance at lower costs and complexity."

The era of E-grade AI and HPC has arrived, bringing new challenges. The programmable NDR InfiniBand product, which is software-defined, hardware-accelerated, and network-computing-oriented, will provide samples in the second quarter of 2021. The introduction of NDR products will significantly enhance the performance and efficiency of E-grade AI and HPC systems, simplify system management and operations, reduce total cost of ownership (TCO), and protect data center investments.

Naddod has abundant and stable inventory of optical modules and high-speed cables for InfiniBand NDR rates. We ensure fast delivery and have successfully completed product deliveries for multiple enterprises. After placing an order, we guarantee a two-week delivery time. Each product is 100% tested on real devices, including testing scenarios with tens of thousands of simultaneous applications running smoothly, which better meets real-world application requirements. Our optical module's bit error rate is even lower than the original factory level, and its power consumption and loss far exceed the quality of products available on the market. With multiple successful delivery and real-world application cases, you don't need to worry about product quality and inventory. In addition to providing high-quality third-party optical modules, we also have ample stock of NVIDIA's original factory products. Feel free to inquire for more information.