Purpose-Built for High Performance AI Workloads

Adaptive routing, in-network computing, congestion control architecture, empower InfiniBand to meet the rigorous demands of HPC and AI clusters. These optimizations ensure seamless data flow, eliminate bottlenecks, and enable efficient resource utilization, driving superior performance and operational efficiency for complex infrastructures.

High Performance, Low Latency

InfiniBand achieves end-to-end latency as low as 2 µs and switch latency down to 230 nanoseconds (NDR), ideal for AI/ML workloads that rely on rapid data processing. This reduces communication delays, accelerating model training and inference cycles.

Lossless Transmission with Credit-Based Flow Control

With a credit-based flow control, InfiniBand provides a truly lossless network, mitigating packet loss and ensuring that no data is dropped during transfer data-key for reliable, large-scale data handling.

Adaptive Routing for Optimal Load Distribution

Adaptive multipath routing dynamically balances traffic by selecting optimal paths based on real-time congestion, it reduces bottlenecks, enhances throughput, and improves overall network efficiency, making InfiniBand ideal for environments with fluctuating data loads.

In-Network Computing with SHARP Protocol

The Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) enables in-network data aggregation, reducing latency and data movement. Offloading collective operations from the CPU to the network, SHARP improves data throughput and maximizes bandwidth utilization, accelerating compute-intensive tasks.

Scalability with Flexible Topologies

Supporting up to 48,000 nodes in a single subnet, it eliminates ARP and broadcast overhead. Advanced topologies, including Fat-Tree, Dragonfly+, and multi-dimensional Torus, provide flexible, high-performance configurations tailored to specific application needs.

Stability and Resilience with Self-Healing Technology

Self-Healing Networking technology reduces network recovery times to as little as 1 millisecond, ensuring high availability and resilience-critical for uninterrupted AI and data-intensive operations.

Scalable Architecture for Peak AI Performance

InfiniBand Fat-tree 2-tier Topology in AI Networking

Fat-tree topology is widely recognized as an optimal architecture for InfiniBand-based AI GPU clusters, ensuring consistent bandwidth and high throughput for large-scale deployments. Leveraging cutting-edge hardware like NVIDIA H100 and H200 GPUs, the DGX platform, and emerging solutions such as GB200, this topology is particularly suited for handling intensive AI workloads. For example, with Quantum-2 switches and ConnectX-7 adapters/NICs offering 8x 400G single ports per node, a 3-tier Fat-tree setup can scale to support up to 65,000 GPUs, while a more common 2-tier configuration efficiently handles clusters with up to 2,000 GPUs.

NADDOD InfiniBand Network Solutions for AI/ML Application

Flexible solutions tailored to varying AI cluster sizes, data center layouts, and connection distances.

800G SR8 InfiniBand solution for AI data center

Small-Scale InfiniBand Clusters - Multimode Solutions

Multimode transceivers offer cost-effective, reliable performance for short distances.

Use Case:
In typical practice, Server-to-Leaf and Leaf-to-Spine connections under 50 meters.
Products:
Leaf-to-Spine Switches: 800G OSFP 2xSR4 multimode transceivers
Server-to-Leaf Switches: 400G OSFP SR4 multimode transceivers

Mid-to-large AI Clusters - Single-Mode + DAC Cable Solutions

Single-mode transceivers enable stable, long-distance connections, while DAC cables lower costs and power consumption. Together, they provide an efficient solution for mid-to-large clusters. DAC cables require careful layout planning due to shorter distances and thicker cabling.

800G DR8 InfiniBand solution for AI data center

Single-Mode Transceivers + 800G DAC/ACC Cables

Use Case:
Leaf and Spine switches colocated or in adjacent racks for short-distance DAC connections. Single-mode transceivers handle longer server-to-Leaf distances with high-speed, low-latency performance.
Products:
Leaf-to-Spine Switches: 800G OSFP DAC/ACC cables (support up to 5 meters)
Server-to-Leaf Switches: 800G OSFP 2xDR4 and 400G OSFP DR4 single-mode transceivers (both support up to 100 meters)
800G FR8 InfiniBand solution for AI data center

Single-Mode Transceivers + Breakout DAC/ACC Cables

Use Case:
Breakout DAC cables connect servers to Leaf switches in adjacent racks. For Leaf-to-Spine distances exceeding 50 meters and up to 2 kilometers, single-mode transceivers provide reliable, high-performance connectivity.
Products:
Server-to-Leaf Switches: 800G OSFP breakout DAC/ACC cables (support up to 5 meters)
Spine-to-Leaf-switches: 800G OSFP 2xFR4 (supports up to 2 kilometers; suitable for inter-building connections) or 800G OSFP 2xDR4 (optimized for distances under 500 meters with high port density)

Common Network Issues Affecting AI Training Efficiency

80% AI Training Interruptions Stem from Network-Side Issues

95% Network Problems Often Linked to Faulty Optical Interconnects

Common Network Issues Affecting AI Training Efficency

NADDOD - Safeguarding AI Clusters from Training Interruptions

Broadcom DSP and VCSEL Deliver Ultra-Low BER and Stability

Rigorous Compatibility Across NVIDIA Ecosystems

Industry-Leading Manufacturing Ensures Consistent Quality and Fast Delivery

Comprehensive Product Portfolio with Customizable Solutions

Extensive Expertise and Dedicated Support for InfiniBand Cluster Deployments

NADDOD InfiniBand Product Portfolio for AI Workloads

InfiniBand Transceivers and Cables

NVIDIA Quantum-2 connectivity options enable flexible topologies with a variety of transceivers, MPO connectors, ACCs, and DACs featuring 1–2 or 1–4 splitter options. Backward compatibility connects 400Gb/s clusters to existing 200Gb/s or 100Gb/s infrastructures, ensuring seamless scalability and integration.

InfiniBand NDR Optics

InfiniBand Adapters/NICs

The NVIDIA ConnectX-7 InfiniBand adapter delivers unmatched performance for AI and HPC workloads. Supporting PCIe Gen4 and Gen5, it offers single or dual network ports with speeds of up to 400Gb/s, available in multiple form factors to meet diverse deployment needs.

Advanced In-Network Computing capabilities and programmable engines are built into the ConnectX-7, enabling efficient preprocessing of data algorithms and offloading application control paths directly to the network. These features optimize performance, reduce latency, and enhance scalability for demanding applications.

ConnectX-7 Adapter

InfiniBand Switches

The NVIDIA Quantum-2 switches support up to 64 400Gb/s ports or 128 200Gb/s ports using 32 OSFP connectors. The compact 1U design is available with air-cooled or liquid-cooled options, providing flexibility for internal or external management.

Delivering an aggregated 51.2 Tb/s bidirectional throughput and handling over 66.5 billion packets per second (bpps), Quantum-2 switches meet the demands of high-performance AI and HPC networks.

Quantum-2 Switches

What Customers Say

Our InfiniBand setup is using NADDOD's transceivers and cables—rock-solid performance!
We use NADDOD optics for our InfiniBand setup. Great quality and performance.
Perfect!👍Can’t imagine an easier solution for our infrastructure.
Reliable products and great support
NADDOD truly understands our needs! Best choice for AI network!
naddod infiniband ndr transceiver
infiniband cluster
naddod infiniband ndr transceiver
infiniband clusterinfiniband cluster

Contact us

Partner with NADDOD to Accelerate Your InfiniBand Network for Next-Gen AI Innovation

+1
I agree to NADDOD's Privacy Policy and Term of Use.
Submit