Small-Scale InfiniBand Clusters - Multimode Solutions
Multimode transceivers offer cost-effective, reliable performance for short distances.
-->
Adaptive routing, in-network computing, congestion control architecture, empower InfiniBand to meet the rigorous demands of HPC and AI clusters. These optimizations ensure seamless data flow, eliminate bottlenecks, and enable efficient resource utilization, driving superior performance and operational efficiency for complex infrastructures.
Fat-tree topology is widely recognized as an optimal architecture for InfiniBand-based AI GPU clusters, ensuring consistent bandwidth and high throughput for large-scale deployments. Leveraging cutting-edge hardware like NVIDIA H100 and H200 GPUs, the DGX platform, and emerging solutions such as GB200, this topology is particularly suited for handling intensive AI workloads. For example, with Quantum-2 switches and ConnectX-7 adapters/NICs offering 8x 400G single ports per node, a 3-tier Fat-tree setup can scale to support up to 65,000 GPUs, while a more common 2-tier configuration efficiently handles clusters with up to 2,000 GPUs.
Flexible solutions tailored to varying AI cluster sizes, data center layouts, and connection distances.
Multimode transceivers offer cost-effective, reliable performance for short distances.
Single-mode transceivers enable stable, long-distance connections, while DAC cables lower costs and power consumption. Together, they provide an efficient solution for mid-to-large clusters. DAC cables require careful layout planning due to shorter distances and thicker cabling.
80% AI Training Interruptions Stem from Network-Side Issues
95% Network Problems Often Linked to Faulty Optical Interconnects
NVIDIA Quantum-2 connectivity options enable flexible topologies with a variety of transceivers, MPO connectors, ACCs, and DACs featuring 1–2 or 1–4 splitter options. Backward compatibility connects 400Gb/s clusters to existing 200Gb/s or 100Gb/s infrastructures, ensuring seamless scalability and integration.
InfiniBand NDR OpticsThe NVIDIA ConnectX-7 InfiniBand adapter delivers unmatched performance for AI and HPC workloads. Supporting PCIe Gen4 and Gen5, it offers single or dual network ports with speeds of up to 400Gb/s, available in multiple form factors to meet diverse deployment needs.
Advanced In-Network Computing capabilities and programmable engines are built into the ConnectX-7, enabling efficient preprocessing of data algorithms and offloading application control paths directly to the network. These features optimize performance, reduce latency, and enhance scalability for demanding applications.
ConnectX-7 AdapterThe NVIDIA Quantum-2 switches support up to 64 400Gb/s ports or 128 200Gb/s ports using 32 OSFP connectors. The compact 1U design is available with air-cooled or liquid-cooled options, providing flexibility for internal or external management.
Delivering an aggregated 51.2 Tb/s bidirectional throughput and handling over 66.5 billion packets per second (bpps), Quantum-2 switches meet the demands of high-performance AI and HPC networks.
Quantum-2 SwitchesPartner with NADDOD to Accelerate Your InfiniBand Network for Next-Gen AI Innovation