NVLink vs. PCIe: Choosing the Best Option for NVIDIA AI Servers

NADDOD Neo Switch Specialist Mar 11, 2024

In the field of artificial intelligence, NVIDIA, as an industry leader, has introduced two main GPU versions for AI servers: the NVLink Edition (specifically, the SXM Edition) and the PCIe Edition. What are the fundamental differences between these two options? And how can you make the best choice based on your specific application scenarios? Let's delve into the details.

 

NVLink Edition Servers

 

The SXM architecture, short for Socketed Multi-Chip Module, is a high-bandwidth socketed solution developed by NVIDIA to achieve ultra-fast interconnectivity between GPUs. This unique design allows GPUs to seamlessly integrate with NVIDIA's own DGX and HGX systems. These systems are equipped with specific SXM sockets tailored to each generation of NVIDIA GPUs, including the latest models such as H800, H100, A800, A100, as well as previous models like P100 and V100. This ensures the highest efficiency in terms of connectivity between the GPUs and the systems. For example, an image demonstrating 8 A100 SXM cards working in parallel on the Inspur NF5488A5 HGX system visually showcases this powerful integration capability.

 

NVLink

In HGX Systems, 8 GPUs are tightly coupled using NVLink technology, creating an unprecedented high-bandwidth interconnect network. Specifically, each H100 GPU is connected to 4 NVLink switch chips, enabling astonishing transfer speeds between GPUs of up to 900 GB/s of NVLink bandwidth. Additionally, each H100 SXM GPU is connected to the CPU via PCIe interface, ensuring fast data transfer from any GPU to the CPU for processing.

 

Basic DGX-HGX GPU to CPU Block Diagram

Further enhancing this high-performance interconnectivity is the NVSwitch chip, which interconnects all SXM GPUs on the DGX and HGX system boards, creating an efficient GPU data exchange network. The full-powered A100 GPU without any reduction in functionality achieves up to 600 GB/s of NVLink bandwidth, while the H100 takes it even further to 900 GB/s. Even the market-optimized A800 and H800 models maintain a high-speed interconnect performance of 400 GB/s.

 

When it comes to the differences between DGX and HGX, NVIDIA DGX can be considered a pre-installed and highly scalable complete server solution, delivering industry-leading performance within the same form factor. Multiple NVIDIA DGX H800 units can be easily combined using the NVSwitch system to form a supercluster SuperPod consisting of 32 or even 64 nodes, capable of meeting the demanding requirements of large-scale model training. On the other hand, HGX is an original equipment manufacturer (OEM) custom-made solution.

 

PCIe Edition Servers

 

Compared to the global interconnectivity of SXM GPUs, the interconnectivity of PCIe GPUs is more traditional and limited. In this architecture, GPUs are only directly connected to adjacent GPUs through NVLink Bridges, as shown in the diagram. For example, GPU 1 can only directly connect to GPU 2, and communication between non-directly connected GPUs (such as GPU 1 and GPU 8) must be achieved through slower PCIe channels with the assistance of the CPU. The maximum bandwidth provided by the most advanced PCIe standard is currently limited to 128 GB/s, which is far below the ultra-high bandwidth of NVLink.

 

Basic 8 PCle GPU to CPU Block Diagram

However, despite the slightly lower GPU interconnect bandwidth in the PCIe edition compared to the SXM edition, there is no significant difference in the computational performance of the GPU cards themselves. For application scenarios that do not heavily rely on high-speed interconnectivity between GPUs, such as small to medium-scale model training and deployment of inference applications, the interconnect bandwidth between GPUs does not significantly impact overall performance.

 

A comparison chart of the parameters between A100 PCIe and A100 SXM GPUs shows that there is no significant difference in their computational core performance.

 

A100 PCIe VS A100 SXM

NVLink VS PCIe

The advantage of PCIe-based GPUs primarily lies in their excellent flexibility and adaptability. For users with smaller workloads and a need for flexible GPU configurations, PCIe-based GPUs are undoubtedly an excellent choice. For example, some GPU servers only require 4 or fewer GPU cards, and in such cases, using PCIe-based GPUs allows for easy miniaturization of servers, enabling them to be easily embedded in 1U or 2U server chassis while reducing the requirements for data center rack space.

 

Furthermore, in inference application deployment environments, we often use virtualization techniques to split and finely allocate resources, achieving a one-to-one match between CPUs and GPUs. In this scenario, PCIe-based GPUs are favored due to their lower power consumption (approximately 300W/GPU) and widespread compatibility. On the other hand, SXM-based GPUs in the HGX architecture may have a power consumption of up to 500W/GPU. Although they sacrifice some energy efficiency, they gain top-tier interconnectivity performance advantages.

 

In summary, NVLink-based (SXM) GPUs and PCIe-based GPUs serve different market needs. For large-scale AI model training tasks with extremely high demands for interconnect bandwidth between GPUs, SXM-based GPUs with their unparalleled NVLink bandwidth and exceptional performance become the ideal computing platform. However, for users who prioritize flexibility, cost savings, moderate performance, and broad compatibility, PCIe-based GPUs are a suitable choice. They are particularly well-suited for lightweight workloads, limited GPU resource allocation, and various inference application deployment scenarios.

 

When purchasing NVIDIA AI servers, it is crucial to consider current business requirements, future development plans, and cost-effectiveness. It is important to assess the strengths and weaknesses of both GPU server versions to find the solution that best fits your specific needs. The ultimate goal is to maximize return on investment while ensuring computational efficiency and leaving room for future expansion.

 

NADDOD based on NVLink and PCIe servers, offers network solutions that enable users to build lossless network environments and high-performance computing capabilities. In the face of different application scenarios and user requirements, NVIDIA can tailor the optimal solution according to the specific circumstances, providing users with high bandwidth, low latency, and high-performance data transmission. This effectively addresses network bottlenecks, enhancing network performance and user experience.