Optical Transceiver Requirements in NVIDIA DGX H100 Server Cluster

NADDOD Dylan InfiniBand Solutions Architect Jul 19, 2023

In the NVIDIA DGX H100 server cluster, each SU consists of 32 DGX H100 servers. Four DGX H100 servers are placed in a separate rack, equipped with three Power Distribution Units (PDUs). Different types of switches are placed in two independent racks, with each SU containing ten racks (eight for servers and two for switches).

1. Compute Network (Compute Fabric)

According to the NVIDIA reference design, in a cluster composed of 128 DGX H100 servers, the compute network requires only two layers of switches, both utilizing the NVIDIA QM9700 switch model. Each SU consists of 32 DGX H100 servers, with eight Leaf switches in each SU. Each DGX H100 in an SU needs to be connected to eight Leaf switches. Since each server only has four 800G OSFP ports for compute network connections, by using expansion ports, one OSFP port is expanded to two QSFP ports, enabling each DGX H100 to connect to eight Leaf switches. On the server side, 800G optical transceivers are required.

 

The demand for 800G optical transceivers on the server side is calculated as 4 * 32 * 4 (first 4: each server has 4 800G OSFP ports; 32: each SU has 32 servers; second 4: there are 4 SUs in the cluster).For the downlink ports of the Leaf switches, 400G optical transceivers are required, with a demand of 32 * 8 * 4 (32: each Leaf switch has 32 downlink ports for connecting 32 servers; 8: each SU has 8 Leaf switches; 4: there are 4 SUs in the cluster).

 

The uplink ports of the Leaf switches use 800G optical transceivers, with a demand of 16 * 8 * 4 (16: each Leaf switch has 16 uplink ports for connecting 16 Spine switches; 8: each SU has 8 Leaf switches; 4: there are 4 SUs in the cluster). The downlink ports of the Spine switches use 800G optical transceivers, with a demand of 32 * 16 (32: each Spine switch has 32 downlink ports for connecting 32 Leaf switches; 16: there are 16 Spine switches in the cluster).Based on the calculations above, in this server cluster, the compute network requires 1536 800G optical transceivers and 1024 400G optical transceivers. Each DGX H100 corresponds to 12 800G optical transceivers and 7.3 400G optic transceivers, meaning each H100 requires 1.5 800G optical transceivers and 0.9 400G optical transceivers.

DGX H100 Cluster Compute Fabric

2. Storage Network (Storage Fabric)

In the DGX H100 server cluster, the storage network connectivity primarily relies on 400G/200G optic modules/fibers. The quantity of optic transceivers/cables required for the connection between Leaf switches and storage devices may vary. Therefore, based on the following assumptions, we estimate the demand for ports/optic transceivers in the storage network: According to the NVIDIA white paper, in a cluster composed of 128 DGX H100 servers, the storage network requires a total of 16 QM9700 switches. We assume that each DGX H100 has 2 storage network ports, and each switch has 64 400G ports, all equipped with optic transceivers.

 

This estimation does not consider the optical module demand for some redundant ports on the switches or the optic demand for UFM and storage devices. Based on these assumptions, we calculate that the system requires a total of 128 * 2 + 16 * 64, which is 1280 400G optical transceivers. This means that each server corresponds to 10 400G optical transceivers, and each GPU corresponds to approximately 1.25 400G transceivers.

3. Calculation of Optic transceiver/Chip Quantity Requirements in the DGX H100 Server Cluster

Following the calculation process for the DGX A100 cluster, we only consider the optic module requirements for the compute network and storage network in the DGX H100 server cluster. Based on our calculations, in the DGX H100 cluster, each H100 corresponds to a demand of 1.5 800G optical transceivers and 2.15 400G optical transceivers, and each H100 corresponds to a demand of approximately 20 100G optical chips.

4. Conclusion

In summary, the network architecture and optical module requirements are crucial considerations for the NVIDIA DGX H100 server cluster. The compute network utilizes a two-layer switch architecture with NVIDIA QM9700 switches, requiring a significant number of 800G and 400G optical modules. The storage network relies on 400G optical modules for connectivity. By accurately estimating the optical module requirements, we can better plan and meet the hardware needs of the cluster. The design and configuration of these network architectures and optical modules will provide efficient data transfer and communication capabilities for the NVIDIA DGX H100 server cluster, supporting high-performance operations of deep learning and data processing workloads.