NVIDIA Quantum-2 InfiniBand: Ultra-High Bandwidth and Low Latency | Product Features and Solutions - NADDOD Blog

What is NDR OSFP?

NADDOD Dylan InfiniBand Solutions Architect Sep 4, 2023

NVIDIA Quantum-2 InfiniBand platform, with its ultra-high bandwidth, ultra-low latency, and simplified operation and maintenance, has become the preferred choice for artificial intelligence and large-scale cloud data centers. Compared to the previous generation products, Quantum-2 achieves twice the port speed, three times the switch port density, five times the switch system capacity, and 32 times the switch AI acceleration capability. If using the Dragonfly+ topology, the network based on Quantum-2 can achieve a 400Gb/s connection capacity for over a million nodes within three hops. It provides AI developers and researchers with powerful network performance to tackle globally challenging problems.

1. Quantum-2 InfiniBand Switch

The NVIDIA Quantum-2 box switch has two main models: QM9790 and QM9700. The main difference between them lies in the management method. QM9700 has a management interface for external management support, while QM9790 does not. There is no difference in terms of port types and speeds. The QM9700 series switch supports flexible combinations of 64 400G ports or 128 200G ports. It is important to note that the QM9700 adopts a 1U design with a single panel of 32*OSFP ports. Each OSFP port supports a 2X400G speed, and within the switch, the information for the 64 400G ports is displayed in the form of IB1/**/1 and IB1/**/2.

NADDOD Testing

The NVIDIA Quantum-2 is equipped with the third-generation NVIDIA SHARP technology, which enables almost infinite scalability for aggregating network data of various scales. Its AI acceleration capability is 32 times higher than the previous generation. Furthermore, the third-generation SHARP technology supports multiple tenants or parallel applications sharing infrastructure without compromising performance. The MPI_Alltoall acceleration, MPI tag matching hardware engine, and other features such as advanced congestion control, dynamic routing, and self-healing network provide crucial enhancements for high-performance computing (HPC) and AI clusters, thereby elevating their performance to new heights.

MPI

2. ConnectX-7 HCA Network Card

The NVIDIA ConnectX-7 InfiniBand network card (HCA) ASIC provides a data throughput of 400Gb/s and supports 32 lanes of PCIe 5.0 or PCIe 4.0 host interface. The 400Gb/s InfiniBand utilizes advanced SerDes technology with 100Gb/s per lane, and the physical connection is achieved through OSFP connectors on the switch and HCA ports. Each OSFP connector on the switch supports two 400Gb/s InfiniBand ports or 200Gb/s InfiniBand ports. The OSFP connector on the network card HCA supports one 400Gb/s InfiniBand port. The 400Gb/s cable products include active and passive copper cables, transceivers, and MPO fiber cables. It is important to note that although both the network card and switch use OSFP packaging, there are differences in physical dimensions. The OSFP module on the switch side is equipped with heat fins, while the network card side uses an OSFP-RHS structure without heat fins, relying on auxiliary heat dissipation modules on the network card ports for cooling.

InfiniBand OSFP-RHS

3. NDR Optical Connection Problem Solution

The physical form of NDR switch ports is OSFP, with eight channels per interface and each channel utilizing 100Gb/s SerDes. Therefore, in terms of connection speed, there are three mainstream options: 800G to 800G, 800G to 2X400G, and 800G to 4X200G. Additionally, each channel supports a downgrade from 100Gb/s to 50Gb/s, allowing for interconnectivity with the previous generation HDR devices (which use 50Gb/s SerDes), supporting configurations such as 400G to 2X200G.

 

The NDR series cables and transceivers offer a wide range of product choices for configuring any network switch and adapter system, focusing on data center lengths of up to 2 kilometers for accelerating AI computing systems. To minimize data retransmission, the cables and transceivers feature low latency and extremely low bit error rate (BER) required for high-bandwidth AI and accelerated computing applications.

 

In terms of connector types, there are three main options: passive copper cables (DAC), active copper cables (ACC), and optical modules with jumpers. DAC supports transmission distances of 1-3 meters (2 meters for direct connections), ACC supports transmission distances of 3-5 meters, and multi-mode optical modules support a maximum transmission distance of 50 meters, while single-mode optical modules support a maximum transmission distance of 500 meters.

InfiniBand Optical Interconnection

4. NADDOD OSFP-800G-2XSR4H

  • Optical Transceiver Technology Introduction

The NADDOD OSFP-800G-2XSR4H is a 2x400Gb/s dual-port multi-mode parallel 8-channel transceiver. It utilizes 100G-PAM4 modulation and uses two 4-channel multi-mode MPO-12/APC fiber jumpers, with a maximum transmission distance of up to 50 meters. The dual-port 2xSR4 transceiver is a critical innovative technology that features two internal transceiver engines, enabling the realization of 64 400Gb/s ports in the Quantum-2 switch with 32*OSFP ports.

  • Module Usage Introduction

The OSFP-800G-2XSR4H dual-port transceiver will be used to connect the 400G NDR InfiniBand Quantum-2 switch to either another switch or to the ConnectX-7 Adapter/BlueField-3 DPU.

(1) NDR InfiniBand Quantum-2 Switch —— NDR InfiniBand Quantum-2 Switch

① Single 2x400Gb/s OSFP - Single 2x400Gb/s OSFP: Using OSFP-800G-2XSR4H transceivers and two direct-attach multi-mode MPO-12/APC fiber jumpers (M4MPOA12FB), the QM9700/QM9790 switches with OSFP ports are connected together at 800G (2x400G) speed, with a maximum transmission distance of up to 50m.

 

② Single 2x400Gb/s OSFP - Dual 2x400Gb/s OSFP: By using OSFP-800G-2XSR4H transceivers and two MPO fiber jumpers, it is possible to route to two different switches, creating two separate 400Gb/s links, and then route additional OSFP ports to more switches.

400G IB EN SWITCH-TO-SWITCH OSFP LINKS

(2) NDR InfiniBand Quantum-2 Switch —— 400G ConnectX-7(OSFP/QSFP112)Adapter/BlueField-3 DPU(QSFP112)

The dual-port OSFP-800G-2XSR4H transceiver can support a maximum of two 400G ConnectX-7 adapters and/or DPUs using two direct-attach multi-mode MPO-12/APC fiber jumpers (M4MPOA12FB). For the OSFP ports on the ConnectX-7 network card side, the OSFP-400G-SR4H optical modules are used, while for the QSFP112 ports on the ConnectX-7 network card/BlueField-3 DPU side, the Q112-400G-SR4H optical modules are used.

400G IB EN SWITCH-TO- 2 CONNECTX-7 AND BLUEFIELD-3

(3) NDR InfiniBand Quantum-2 Switch —— 200G ConnectX-7(OSFP/QSFP112)Adapter/BlueField-3 DPU(QSFP112)

The dual-port OSFP-800G-2XSR4H transceiver, when used with two 1:2 branching MPO fiber jumpers (M4MPOA2x4F), can support a maximum of four 200G ConnectX-7 adapters and/or DPUs combinations. For the OSFP ports on the ConnectX-7 network card side, the OSFP-400G-SR4H optical modules are used, while for the QSFP112 ports on the ConnectX-7 network card/BlueField-3 DPU side, the Q112-400G-SR4H optical modules are used.

 

In the OSFP-400G-SR4H and Q112-400G-SR4H modules, only two channels within the 400G transceiver are activated, thereby creating 200G rate links.

400G IB & EN SWITCH-TO- 4 CONNECTX-7 AND BLUEFIELD-3

(4) NDR InfiniBand Quantum-2 Switch —— DGX H100 GPU Systems

The DGX-H100 features 8 "Hopper" H100 GPUs in the top chassis section and includes two CPUs, storage, and InfiniBand and/or Ethernet networking in the bottom server section. It includes eight 400Gb/s ConnectX-7 ICs installed on the interposer boards of two "Cedar-7" cards for GPU-to-GPU InfiniBand or Ethernet networking.

 

Each dual-port OSFP-800G-2XSR4H transceiver provides two 400G ConnectX-7 links from the DGX to the Quantum-2 switch. Compared to DGX A100, this reduces the redundancy, complexity, and number of ConnectX-7 cards and transceivers. DGX A100 uses 8 individual HCAs and 8 transceivers or AOCs, along with two additional ConnectX-6 for InfiniBand or Ethernet storage.

 

Additionally, for traditional networking of storage, clustering, and management, DGX-H100 also supports up to four ConnectX-7 and/or two BlueField-3 DPUs for storage I/O using InfiniBand and/or Ethernet, as well as other networks or 200G with OSFP or QSFP112 devices utilizing 400G.

400G IB & EN SWITCH-TO-DGX H100&CEDAR-7 LINKS

5. NADDOD OSFP-400G-SR4H

  • Optical Transceiver Technology Introduction

The NADDOD OSFP-400G-SR4H is a single-port OSFP packaged SR4 multi-mode parallel transceiver with a speed of 400Gb/s. It utilizes 100G-PAM4 modulation mode. When paired with a single 4-channel multi-mode MPO-12/APC fiber jumper, it can achieve a maximum transmission distance of 50 meters. If used with a 1:2 branching MPO fiber jumper, the branching end will have only 2 channels activated. In this configuration, only two channels of the 400G transceiver on the branching end will be activated, automatically creating a 200G speed and reducing power consumption.

  • Module Usage Introduction

The OSFP-400G-SR4H transceiver is used for the 400Gb/s ConnectX-7/OSFP PCIe bus network card and is connected to a single 800Gb/s dual-port 2x400G OSFP transceiver (OSFP-800G-2XSR4H) in the Quantum-2 InfiniBand switch.

 

The OSFP-400G-SR4H transceiver supports two different speeds: 400Gb/s and 200Gb/s, depending on the type of fiber jumper used for the connection:

 

① 400Gb/s mode: When paired with an MPO-12/APC fiber jumper (M4MPOA12FB), in this case, the opposite end's OSFP-800G-2XSR4H transceiver needs to be paired with two OSFP-400G-SR4H transceivers connected to two 200Gb/s ConnectX-7/OSFP adapter cards.

 

② 200Gb/s mode: When paired with a 1:2 branching MPO fiber jumper (M4MPOA2x4F), the OSFP-400G-SR4H operates at a speed of 200Gb/s (NDR200) and automatically reduces power consumption as only 2 channels are activated. In this mode, the opposite end's OSFP-800G-2XSR4H transceiver can be paired with four OSFP-400G-SR4H transceivers connected to four 200Gb/s ConnectX-7/OSFP adapter cards.

(1) NDR InfiniBand Quantum-2 Switch —— 2 x 400G ConnectX-7(OSFP)Adapter

The dual-port OSFP-800G-2XSR4H transceiver, when used with two direct-attach multi-mode MPO-12/APC fiber jumpers (M4MPOA12FB), can support a maximum of two ConnectX-7/OSFP adapters. Each adapter utilizes an OSFP-400G-SR4H transceiver and can achieve a maximum transmission distance of 50 meters.

400G IB EN SWITCH-TO- 2 CONNECTX-7 AND BLUEFIELD-3

(2) NDR InfiniBand Quantum-2 Switch —— 4x200G ConnectX-7(OSFP)Adapter

The dual-port OSFP-800G-2XSR4H transceiver, when used with two direct-attach multi-mode MPO-12/APC fiber jumpers (M4MPOA12FB), can support a maximum of two ConnectX-7/OSFP adapters. Each adapter utilizes an OSFP-400G-SR4H transceiver and can achieve a maximum transmission distance of 50 meters.

 

The ConnectX-7/OSFP adapter utilizes OSFP-400G-SR4H modules with only two channels activated, thereby creating a 200G rate link and automatically reducing power consumption.

400G IB & EN SWITCH-TO- 4 CONNECTX-7 AND BLUEFIELD-3

6. NADDOD O2Q56-400G-AOCH

  • AOC(Active Optical Cable)Technology Introduction

The NADDOD O2Q56-400G-AOCH is an OSFP to 2x QSFP56, 400Gb/s to 2x 200Gb/s active optical splitter cable (AOC). It connects each pair of data signals from 8 MMF (multi-mode fiber) pairs originating from a single OSFP end to four pairs on each QSFP56 multi-port end.

  • AOC(Active Optical Cable)Usage Introduction

The O2Q56-400G-AOCH is used to connect an NDR switch with OSFP ports to 2 HDR switches or HCA QSFP56 ports.

 

①NDR InfiniBand Quantum-2 Switch —— 2 x HDR InfiniBand Quantum Switch

 

②NDR InfiniBand Quantum-2 Switch —— 2 x 200G QSFP56 ConnectX-6 Adapter/BlueField-2 DPU

HDR CONNECTIVITY MATRIX-1

7. NADDOD Fiber Optic Patch Cable

The NADDOD M4MPOA12FB MPO-12/APC to MPO-12/APC (8-fiber) passive multi-mode fiber optic patch cable features 8 individual fibers with 4 fibers in each direction. The position numbering scheme is defined by the alignment key and alignment pin. The MPO connectors are of the 8-degree angled polished (APC) type, minimizing optical signal reflections and ensuring optimal signal integrity.

MPO-12 APC

The M4MPOA12FB is designed to interconnect two switches or connect switches to two network adapters. This cable can be used in conjunction with pluggable fiber optic 400GbE/NDR transceivers, such as the OSFP-800G-2XSR4H dual-port transceiver for InfiniBand and Ethernet systems at the switch end, as well as the OSFP-400G-SR4H or QSFP112-400G-SR4H transceivers in ConnectX-7 network adapters and BlueField-3 DPUs.

400G IB EN SWITCH-TO-SWITCH OSFP LINKS

  • MPO-8/APC to 2xMPO-4/APC(M4MPOA2x4F)

The NADDOD M4MPOA2x4F is a multi-mode, 4-channel to 2-channel splitter cable. The MPO connectors are of the angled polished (APC) type, minimizing optical signal reflections and ensuring optimal signal integrity.

M4MPOA2x4F

The M4MPOA2x4F is used to interconnect two servers that share a single port in a high-speed switch. This cable can be used in conjunction with pluggable fiber optic 400GbE/NDR transceivers, such as the OSFP-800G-2XSR4H dual-port transceiver for InfiniBand and Ethernet systems at the switch end, as well as the OSFP-400G-SR4H or QSFP112-400G-SR4H transceivers in ConnectX-7 network adapters and BlueField-3 DPU.

400G IB-EN SWITCH-TO- 4 CONNECTX-7 AND BLUEFIELD-3

8. Introduction to NADDOD Testing

NADDOD has a comprehensive testing environment that covers a range of professional testing equipment, from switches (NVIDIA QM87XX/QM97XX series switches) to network cards (ConnectX-6/7 series network cards) to GPU servers. It provides high-performance, low-power, low-latency NDR solutions. Additionally, NADDOD offers customized testing solutions to help clients build demo environments and conduct on-site testing using a wide range of products and devices. Through analysis, design, and implementation of testing scenarios, the feasibility of solutions is verified to meet various network topology requirements.

 

The NADDOD OSFP-800G-2XSR4H transceiver, paired with the NVIDIA InfiniBand QM97XX series switch, is connected to the NVIDIA ConnectX-7 VPI network card and OSFP-400G-SR4H optical modules. The hardware compatibility testing (plug/unplug, port restart, power restart testing), software compatibility testing (connectivity, shutdown/no shutdown, parameter testing), and performance testing (DDM, bit error rate, stability testing) have been conducted. The link has been running at full speed for one week, with no packet loss, no errors, no CRC errors, and uninterrupted link. All tested parameters comply with relevant industry standards.

 

Here are the specific testing details:

  • Device information display and connectivity
NVIDIA QM9700 Switch Information NVIDIA QM9700 link up Mode

NVIDIA QM9700 Switch Information Display 

NVIDIA QM9700 link up Mode

NVIDIA MCX75310AAS-NEAT Network Card Information Display NVIDIA MCX75310AAS-NEAT Network Card link up Mode

NVIDIA MCX75310AAS-NEAT Network Card Information Display

NVIDIA MCX75310AAS-NEATNetwork Card Link Up

 

  • Transceiver information display
OSFP-800G-2XSR4H Information Display OSFP-400G-SR4H Information Display

OSFP-800G-2XSR4H Information Display

OSFP-400G-SR4H Information Display

 

  • Bit Error Rate Test
NVIDIA QM9700 Switch Side NVIDIA MCX75310AAS-NEAT Network Card Side

OSFP-800G-2XSR4H Information Display

NVIDIA MCX75310AAS-NEAT Network Card Side

 

8. Summary

NADDOD is a leading provider of optical network solutions, specializing in the field of optical connectivity and networking products and solutions. They have a deep understanding of and extensive project implementation experience in building and accelerating high-performance InfiniBand networks. They offer optimal combinations of high-performance InfiniBand switches, intelligent network cards, and AOC/DAC/optical module products based on users' different application scenarios. Their optical network products and comprehensive solutions provide significant advantages and value for data centers, high-performance computing, edge computing, artificial intelligence, and other application scenarios. With low cost and outstanding performance, they greatly enhance customers' business acceleration capabilities.