Driving the Data Center Network Revolution:The Perfect Combination of 800G Optical Modules and NDR Switches

NADDOD Gavin InfiniBand Network Engineer Oct 16, 2023

With the rapid development of technologies such as large-scale models, cloud computing, and big data, data centers are experiencing an unprecedented period of high-speed growth. The demand for training and deploying large-scale models continues to increase, posing unprecedented challenges to computing, storage, and network infrastructure. From the rise of large-scale deep learning models like GPT-4 to massive workloads on cloud computing platforms, as well as large-scale data analysis and high-performance computing applications, all of these require high-performance data center networks to ensure high-speed data transmission and processing.

 

Building high-speed networks in data centers involves multiple key components, including high-rate network cards, optical modules, switches, and high-performance network interconnect technologies. In this complex network ecosystem, InfiniBand (IB) network technology has emerged as the market leader, becoming a crucial means of achieving high-speed data transfer and low-latency communication.

 

Currently, NDR (400G) devices in the IB network technology have been widely deployed, making them the optimal choice for high-speed data center networks used in large-scale models and high-performance computing. On the switch side, the primary devices are NVIDIA's QM9700 and QM9790 series switches. Among them, the switch systems based on NVIDIA Quantum-2 provide an impressive 64 NDR 400Gb/s InfiniBand ports in a standard 1U form factor. This remarkable performance means that a single switch can handle an aggregate bidirectional throughput of 51.2 terabits per second (Tb/s) and has a landmark capacity of over 66.5 billion packets per second (BPPS).

 

NVIDIA Quantum-2 InfiniBand switches not only support NDR high-rate transmission but also offer high throughput, in-network computing, intelligent acceleration engines, flexibility, and robust architecture. These features make them an ideal choice for high-performance computing (HPC), artificial intelligence, and hyperscale cloud infrastructure applications. At the same time, NDR switches contribute to reducing overall costs and complexity, further driving the innovation and development of data center networks.

The difference between QM9700 and QM9790

Similar to previous generations of IB switches, in NDR switches, the QM9700 is a managed switch, while the QM9790 is an unmanaged switch. The difference in functionality is that the managed switch runs a Network Operating System (NOS) similar to regular Ethernet switches. It can be accessed and configured directly through a dedicated management port and provides the functionality of a subnet manager (enabled as needed). On the other hand, the unmanaged switch does not have a CPU on the hardware level and does not run NOS. Configuration is done through a remote configuration tool called mlxconfig. Below are the images depicting the QM9700 (with a management interface on the far right) and the QM9790:

 

Image 1: QM9700 (managed switch)

QM9700 (managed switch)

Image 2: QM9790 (unmanaged switch)

QM9790 (unmanaged switch)

There are also operational differences between the two. The QM9700, being a managed switch, allows direct login for configuration management. Port and module information can be queried using commands, as shown in the examples below:

 

  • Querying port information: show interface ib 1/1/1 (using port 1/1/1 as an example).

 

  • Querying port module information: show interface ib 1/1/1 transceiver.

 

  • Querying port module DDM (Digital Diagnostic Monitoring): show interface ib 1/1/1 transceiver diagnostics.

 

For the unmanaged QM9790, configuration management is done by logging into the connected server (or another managed switch). The following steps outline the process:

 

  • Enter the "fae" mode.

 

  • Enter "ibswitches" to obtain the lid (using lid-1 as an example) of the connected device.

 

  • Query module information: mlxlink -d lid-1 -p 1 -m (query module information for port 1).

 

  • Enable/disable port splitting: mlxconfig -d lid-1 set SPLIT_MODE=1 (0 to disable).

 

  • Enable/disable splitting functionality for a specific port: mlxconfig -d lid-1 set SPLIT_PORT[1..32]=1 (0 to disable).

Switch Side Module:OSFP 800G Optical Transceiver

Due to size and power constraints, the 9700/9790 series switches are limited to 32 cages (OSFP). Each physical interface of OSFP actually provides two independent 400G interfaces, referred to as Twin port 400G by NVIDIA. To complement the use of these switches, NADDOD has introduced the OSFP-800G-2xSR4H module. Please refer to the image below:

NADDOD OSFP 800G Module

The OSFP-800G-2xSR4H module is a dual-port OSFP optical module designed for InfiniBand. It uses two MPO-12/APC jumpers to connect to other devices, with each port operating at a speed of 400Gb/s. The dual-port design is a key innovation that incorporates two internal transceiver engines, fully unleashing the potential of the switch. This allows the 32 physical interfaces to provide up to 64 400G NDR interfaces. This high-density and high-bandwidth design enables data centers to meet the growing network demands and requirements of applications such as high-performance computing, artificial intelligence, and cloud infrastructure. The image below illustrates the module interface:

NADDOD OSFP 800G Module Interface

NADDOD has conducted extensive batch testing of the OSFP-800G-2xSR4H module on the 9700/9790 series switches, and the performance results have been exceptional. Here are some of the test results:

NADDOD OSFP 800G Module Test 1NADDOD OSFP 800G Module Test 2

NADDOD OSFP 800G Module Test 3NADDOD OSFP 800G Module Test 4

The OSFP-800G-2xSR4H optical module offered by NADDOD delivers high performance and reliability, providing robust optical interconnect solutions for data centers. It enables data centers to fully leverage the performance potential of the QM9700/9790 series switches, facilitating high-bandwidth and low-latency data transmission.