Unveiling The Evolution of NVLink

NADDOD Neo Switch Specialist Feb 7, 2024

NVLink is a technology developed by NVIDIA for high-speed point-to-point interconnectivity between GPUs (and also between GPUs and CPUs). It aims to overcome the bandwidth limitations of PCIe interconnects and enable low-latency, high-bandwidth data communication between GPU chips, allowing them to work more efficiently in tandem.

 

Before the introduction of NVLink technology (prior to 2014), GPUs had to be interconnected through PCIe switches, as illustrated in the diagram below. The GPU signals had to pass through PCIe switches, involving CPU distribution and scheduling of data processing, which added additional network latency and limited system performance. At that time, the PCIe protocol had reached the 3rd generation, with a single-channel speed of 8Gb/s, 16 channels in total, providing a total bandwidth of 16GB/s (128Gbps, where 1 byte equals 8 bits). As GPU chip performance continued to improve, their interconnect bandwidth became a bottleneck.

 

gpu1

 

NVlink1.0

In 2014, NVLink 1.0 was released and implemented in the P100 chip, as shown in the diagram below. There are four NVLink connections between two GPUs, with each link consisting of eight channels. Each channel operates at a speed of 20Gb/s. Therefore, the overall bidirectional bandwidth of the system is 160GB/s ((20 * 8 * 4 * 2) / 8 = 160GB/s), which is five times that of PCIe3 x16.

 

GPU

Each NVLink is composed of 16 pairs of differential lines, corresponding to eight channels in each direction, as shown in the diagram below. The two ends of the differential pair are PHYs, which include SerDes.

 

NVLink1

Based on NVLink 1.0, a planar mesh topology can be formed among four GPUs, with point-to-point connections between each pair of GPUs. Eight GPUs correspond to a cubic mesh, which can form a DGX-1 server. However, it is important to note that in this configuration, the eight GPUs do not form a fully connected network.

 

NVLink1-serve

NVlink 2.0

In 2017, NVIDIA introduced the second-generation NVLink technology. It connects two GPU V100 chips with six NVLink connections, with each NVLink consisting of eight channels. The speed of each channel was increased to 25Gb/s, resulting in a bidirectional system bandwidth of 300GB/s ((25 * 8 * 6 * 2) / 8 = 300GB/s), nearly twice that of NVLink 1.0. Additionally, to achieve full interconnectivity among eight GPUs, Nvidia introduced NVSwitch technology. NVSwitch 1.0 has 18 ports, with each port having a bandwidth of 50GB/s, resulting in a total bandwidth of 900GB/s. Each NVSwitch retains two ports for connection to the CPU. By using six NVSwitches, it is possible to establish a fully connected network among eight GPU V100 chips, as shown in the diagram below.

 

NVLink2-NVSwitches

The DGX-2 system is composed of two boards, as shown in the diagram below, achieving full connectivity among 16 GPU chips.

 

NVLink2-DGX-2

 

NVlink3.0

In 2020, NVLink 3.0 technology was introduced. It connects two GPU A100 chips with 12 NVLink connections, with each link consisting of four channels. Each channel operates at a speed of 50Gb/s, resulting in a bidirectional system bandwidth of 600GB/s, which is twice that of NVLink 2.0. With the increase in the number of NVLinks, the number of ports on the NVSwitch also increased to 36, with each port operating at a speed of 50GB/s.

 

The DGX A100 system consists of 8 GPU A100 chips and 4 NVSwitches, as shown in the diagram below.

 

DGX-A100

NVlink 4.0

In 2022, NVLink technology was upgraded to its fourth generation, allowing two GPU H100 chips to be interconnected through 18 NVLink links. Each link consists of 2 channels, and each channel supports a speed of 100Gb/s (PAM4), resulting in a bidirectional total bandwidth increase to 900GB/s. NVSwitch was also upgraded to its third generation, with each NVSwitch supporting 64 ports, and each port operating at a speed of 50GB/s.

 

The DGX H100 consists of 8 H100 chips and 4 NVSwitches, as shown in the diagram below. On the other side of each NVSwitch, multiple 800G OSFP optical modules are connected. Taking the first NVSwitch on the left as an example, the unidirectional total bandwidth on the side connecting the GPUs is 4Tbps (4 * 5 NVLink * 200Gbps), and the total bandwidth on the side connecting the optical modules is also 4Tbps (5 * 800Gbps). The two bandwidths are equal, forming a non-blocking network.

 

*It is important to note that the bandwidth in the optical modules is unidirectional, while in AI chips, bidirectional bandwidth is generally used.

 

NVLink4-DGX-H100

The following table summarizes the performance parameters of each generation of NVLink.

 

Generation NV Link 1.0 NV Link 2.0 NV Link 3.0 NV Link 4.0
Year 2014 2017 2020 2024
Transfer rate per link /(GB/s) 20+20 25+25 25+25 25+25
Link number 4 6 12 18
Lane number per link 8 8 4 2
Data rate per lane /(Gbps) 20 25 50 100(PAM4)
Total bidirectional bandwidth /(GB/s) 160 300 600 900

The following table shows the parameters of each generation of PCIe.

 

Generation PCle 1.0 PCle 2.0 PCle 3.0 PCle 4.0 PCle 5.0 PCle 6.0
Year 2003 2007 2010 2017 2019 2021
Transfer rate per lane /Gbps 2.5 5.0 8.0 16.0 32.0 64.0(PAM4)
x16 bandwidth /(GB/s) 4.0 8.0 15.8 31.5 63.0 121

From a single-channel speed perspective, NVLink is typically around twice as fast as PCIe of the same generation. The advantage in total bandwidth is even more pronounced, with NVLink providing approximately five times the total bandwidth of PCIe.

 

Summary

Over the past decade, NVLink has become a core technology for NVIDIA GPU chips and an essential part of their ecosystem. It effectively addresses the high-bandwidth, low-latency interconnectivity challenges between GPU chips, revolutionizing traditional computing architectures. However, since NVLink is proprietary to NVIDIA, other AI chip companies can only use PCIe or other interconnect protocols. Additionally, NVIDIA is exploring the use of optical interconnects for GPU-to-GPU connections, as shown in the diagram below. This approach involves packaging silicon photonics chips alongside the GPUs, with optical fibers connecting the two GPU chips.

 

 

Since the acquisition of Mellanox, NVIDIA has also started combining NVLink technology with InfiniBand (IB) technology, introducing the new generation of NVSwitch chips and switches with SHARP functionality, optimized for external GPU server networks. The current scale of NVLink networks, which supports a maximum of 256 GPUs, is just the beginning. It is expected that the scale of NVLink networks will continue to grow and improve in the future, potentially creating a supercomputing cluster that integrates multiple networks, such as AI computing, CPU computing, and storage, into one cohesive system.

 

Naddod provides high-quality InfiniBand NDR 800G /HDR 200G /EDR 100G AOC and DAC series products for server clusters that require low latency, high bandwidth, and reliability in network applications. Our products offer exceptional performance while reducing costs and complexity. With multiple successful deliveries and real-world application cases, we ensure the highest quality standards. You can rely on us for product quality and availability as we always maintain sufficient inventory to meet your needs promptly.

 

In addition to offering third-party high-quality optical modules, we also stock a wide range of original NVIDIA products, providing you with more options. Contact us now to learn more details!

 

Naddod - Your trusted supplier of optical modules and high-speed cables!

 

Resource Links:

https://developer.nvidia.com/blog/upgrading-multi-gpu-interconnectivity-with-the-third-generation-nvidia-nvswitch/

https://www.servethehome.com/nvidia-nvlink4-nvswitch-at-hot-chips-34/

https://www.servethehome.com/nvidia-nvswitch-details-at-hot-chips-30/

https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s42663/

https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41784/

https://www.nextplatform.com/2022/08/17/nvidia-shows-what-optically-linked-gpu-systems-might-look-like/

https://blog.apnic.net/2023/08/10/large-language-models-the-hardware-connection/

https://en.wikichip.org/wiki/nvidia/nvswitch

https://developer.nvidia.com/blog/dgx-1-fastest-deep-learning-system/