Optical Transceiver Requirements in NVIDIA DGX H100 Server Cluster

NADDOD Dylan InfiniBand Solutions Architect Jul 19, 2023

In the NVIDIA DGX H100 server cluster, each SU consists of 32 DGX H100 servers. Four DGX H100 servers are placed in a separate rack, equipped with three Power Distribution Units (PDUs). Different types of switches are placed in two independent racks, with each SU containing ten racks (eight for servers and two for switches).

1. Compute Network (Compute Fabric)

According to the NVIDIA reference design, in a cluster composed of 128 DGX H100 servers, the compute network requires only two layers of switches, both utilizing the NVIDIA QM9700 switch model. Each SU consists of 32 DGX H100 servers, with eight Leaf switches in each SU. Each DGX H100 in an SU needs to be connected to eight Leaf switches. Since each server only has four 800G OSFP ports for compute network connections, by using expansion ports, one OSFP port is expanded to two QSFP ports, enabling each DGX H100 to connect to eight Leaf switches. On the server side, 800G optical transceivers are required.

 

The demand for 800G optical transceivers on the server side is calculated as 4 * 32 * 4 (first 4: each server has 4 800G OSFP ports; 32: each SU has 32 servers; second 4: there are 4 SUs in the cluster).For the downlink ports of the Leaf switches, 400G optical transceivers are required, with a demand of 32 * 8 * 4 (32: each Leaf switch has 32 downlink ports for connecting 32 servers; 8: each SU has 8 Leaf switches; 4: there are 4 SUs in the cluster).

 

The uplink ports of the Leaf switches use 800G optical transceivers, with a demand of 16 * 8 * 4 (16: each Leaf switch has 16 uplink ports for connecting 16 Spine switches; 8: each SU has 8 Leaf switches; 4: there are 4 SUs in the cluster). The downlink ports of the Spine switches use 800G optical transceivers, with a demand of 32 * 16 (32: each Spine switch has 32 downlink ports for connecting 32 Leaf switches; 16: there are 16 Spine switches in the cluster).Based on the calculations above, in this server cluster, the compute network requires 1536 800G optical transceivers and 1024 400G optical transceivers. Each DGX H100 corresponds to 12 800G optical transceivers and 7.3 400G optic transceivers, meaning each H100 requires 1.5 800G optical transceivers and 0.9 400G optical transceivers.