NADDOD NDR InfiniBand Network Solution for High-Performance Networks - NADDOD Blog

NADDOD NDR InfiniBand Network Solution for High-Performance Networks

NADDOD Abel InfiniBand Expert Apr 3, 2023

In recent years, artificial intelligence (AI) has developed rapidly as a new technological trend, and AI applications represented by ChatGPT have gradually changed the way people work and live. With the continuous expansion of training data sets and the optimization of deep learning algorithms, the computing resources required for training large language models (LLM) are also increasing.

To meet the demand of training large language models, a significant amount of computing resources including CPUs, GPUs, and DPUs are requried. These computing resources need to be connected to the servers training the model through a network, so the network’s bandwidth and latency will directly affect the training speed and efficiency. In response to network bottlenecks, NVIDIA launched the Quantum-2 InfiniBand NDR platform, which can provide AI developers and scientific researchers with super network performance and rich functions to help them solve challenging problems.

NADDOD NDR InfiniBand Network Solution for HPC, AI and Data Centers

Based on the NVIDIA Quantum-2 InfiniBand platform and our understanding of the development trend of high-speed networks and rich experience in implementing high-performance network projects, NADDOD provides customers with NDR (Next Data Rate) network solutions. NADDOD’s NDR solution mainly includes the Quantum-2 InfiniBand 800G switch (2x400G NDR interface), ConnectX-7 InfiniBand host network card, and Mellanox LinkX InfiniBand optical modules and cables, which can build ultra-strong network performance with low latency and high bandwidth for critical fields such as high-performance computing (HPC), hyperscale cloud data centers, and artificial intelligence.

NADDOD NDR Networking Solution

Quantum-2 InfiniBand NDR Switch

NADDOD offers the NVIDIA Quantum-2 QM9700 and QM9790 switches. Currently, the QM9700 switch has been deployed in NADDOD’s Artificial Intelligence and High-Performance Computing R&D and Test Center for NADDOD 800G (2× 400G NDR) network technology R&D innovation and product reliability testing, providing users with excellent innovative network acceleration services.

NADDOD NVIDIA Quantum-2 QM9700 and QM9790 switches
The QM9700 and QM9790 switches are designed in a 1U standard fixed-configuration chassis with 32 800G physical interfaces and support 64 NDR 400Gb/s InfiniBand ports (up to 128 200Gb/s HDR ports can be split). Quantum-2 NDR switches support third-generation NVIDIA SHARP, advanced congestion control, adaptive routing, and self-healing network technologies. Compared with the previous generation of 200G HDR switches QM8700 and QM8790, NDR switches achieves twice the port speed, three times the switch port density, five times the switch system capacity, and 32 times the switch AI acceleration capability.

InfiniBand Quantum-2 NDR switches
The QM9700 and QM9790 InfiniBand NDR switches include air-cooled and liquid-cooled models, internally managed and externally managed (also known as unmanaged) switch models. Each switch can support 51.2Tb/s of bidirectional aggregated bandwidth and an amazing throughput capacity of more than 66.5 billion data packets per second (BPPS). The switching capacity is about 5 times higher than that of the previous generation Quantum-1.

in-network computing accelerated supercomputing
As an ideal rack-mounted InfiniBand solution, QM9700 and QM9790 switches have strong flexibility and can support various network topologies such as Fat Tree, DragonFly+, and multidimensional Torus. At the same time, it supports backward compatibility with previous generations of products, and has extensive software system support.

Quantum-2 ConnectX-7 SmartNIC NDR Network Adapter

NADDOD provides single-port or dual-port NVIDIA ConnectX 7 smart network cards with InfiniBand NDR or NDR200 on the network card side, using NVIDIA Mellanox Socket Direct® technology to implement 32-channel PCIe Gen4.

ConnectX-7 is based on a 7nm lithography process and contains 8 billion transistors. Its data transfer rate is twice that of NVIDIA ConnectX-6, the world’s leading high-performance computing network chip. It also doubles the performance of RDMA, GPUDirect® Storage, GPUDirect RDMA and In-network Computing.

The NDR Host Channel Adapter (HCA) also includes multiple programmable cores that offload pre-processed data algorithms and application control paths from the CPU or GPU to the network, providing higher performance, scalability, and overlap between compute and communication tasks. It can meet the needs of traditional enterprises and even the world’s most demanding artificial intelligence, scientific computing and hyperscale cloud data center workloads.

Quantum-2 ConnectX-7 SmartNIC NDR Network Adapter

Mellanox LinkX InfiniBand NDR Optical Transceivers and Cables

NADDOD provides flexible 400Gb/s NDR InfiniBand optical connection solutions, including the use of single-mode and multi-mode transceivers, MPO fiber patch cords, active copper cables (ACC) and passive copper cables (DAC), to meet the needs of building various networks topology needs.

Dual-port transceivers with finned OSFP connectors are suitable for air-cooled fixed configuration switches, while dual-port transceivers with flat OSFP connectors are suitable for liquid-cooled modular switches and HCAs.

For the switch to switch interconnection, you can choose to use the new OSFP package 2xNDR (800Gbps) optical module to interconnect two QM97XX switches. The fin design can greatly improve the heat dissipation of the optical module.

For the interconnection of the switch and HCA, the switch end adopts OSFP package 2xNDR (800Gbps) finned-top optical module, the network card end adopts a flat-top OSFP 400Gbps optical module. The MPO fiber jumper can provide 3-150 meters, and a MPO to 2xMPO crossover fiber splitter cable is available in 3-50 meters.

Mellanox LinkX InfiniBand NDR Optical Transceivers
The interconnection between the switch and HCA can also adopt direct attach copper cabel (DAC) up to 1.5 meters or active copper cable (ACC) up to 3 meters. One NDR 800Gb/s to 2x 400Gb/s passive copper splitter cable can be used to connect a switch with OSFP interface (equipped with two 400Gb/s NDR InfiniBand port) and two standalone 400Gb/s HCAs with OSFP or QSFP112 interface. One NDR 800Gb/s to 2x 400Gb/s passive copper splitter cable can connect a switch’s OSFP port and 4 200Gbs/s ports of HCAs.

NDR passive copper splitter cable

InfiniBand 400Gb/s NDR Benefits

The NVIDIA Quantum-2 InfiniBand platform continues to set higher data rate in the world for high-performance networking, each port can achieve a transmission speed of 400Gb/s NDR.

Port density

Compared with the previous generation of 200G HDR, through the implementation of NVIDIA port splitting technology, the port speed is doubled, the port density of the switch is tripled, and the capacity of the switch system is five times. When adopting the Dragonfly+ topology based on Quantum-2’s network, it can achieve 400Gb/s connection capacity of more than one million nodes within 3 hops, while reducing power consumption, latency and space requirements.

Performance

The third-generation NVIDIA SHARP technology, namely SHARPv3, has been introduced. SHARPv3 creates nearly unlimited scalability for large-scale data aggregation through a scalable network, supports up to 64 parallel streams, and improves AI acceleration by 32 times compared to the previous generation of 200G HDR products.

Cost

Compared with the previous generation HDR, the use of NDR equipment can reduce network complexity and improve efficiency, and the cable and network interface card (NIC) can be directly replaced when the subsequent rate upgrade will be performed. The number of equipment required is smaller when using NDR networks to support the same network, which is more cost-effective for the overall budget and later investment.

If you are interested in NDR 400G InfiniBand high-speed optical network, or need technical support for NDR 400G network projects, please contact NADDOD for free support on product and solution.

Related Resources:
NVIDIA Quantum-2 InfiniBand NDR 400Gb/s
InfiniBand NDR: The Future of High-Speed Data Transfer in HPC and AI
NADDOD 200G InfiniBand HDR AOC vs OEM
MMA4Z00-NS: The InfiniBand NDR Transceiver for 800G High-Speed Data Transfer
InfiniBand NDR: The Future of High-Speed Data Transfer in HPC and AI