AI Computing Cluster Connection Device Selection

NADDOD Quinn InfiniBand Network Architect Mar 1, 2024

In the current AI computing clusters, 80% of the power is consumed by data transmission, while 90% of the time is spent on disk I/O and network communication, with only a low percentage dedicated to actual computation. Based on our tracking of the usage and feedback data of the Laxcus distributed operating system in data centers and computing centers, it is evident that more than 50% of the time is consumed by disk I/O and network communication. Given the significant impact of disk I/O and network communication on large-scale distributed computing, today we will discuss how to select network communication equipment in AI computing clusters.


When choosing network communication equipment in AI computing cluster networks, the following situations need to be considered:


  1. Packet loss frequency of the communication devices


  1. Communication latency between computing nodes


  1. Mechanisms for resolving congestion between computing nodes


All AI computing clusters operate in a distributed environment, and AI computation is subject to the "weakest link in the chain" principle. Any frequent occurrence of the aforementioned situations can have a severe impact on overall computational performance.


Within servers, there are two main options for communication equipment:


  1. NVLink


  1. PCIe


In the communication equipment selection between servers, or computational nodes, there are three options:


  1. NVSwitch


  1. InfiniBand (IB) network


  1. RoCE (RDMA over Converged Ethernet) network


NVSwitch is a product offered by NVIDIA and is currently not sold separately. It is typically bundled and sold with NVIDIA's own hardware devices, making it less commonly encountered in the market. However, several years ago, during the deployment of the Laxcus distributed operating system on DGX servers, NVSwitch was tested using network testing tools. It demonstrated a communication efficiency that was approximately twice that of the IB network.


Since its introduction in 2000, InfiniBand (IB) network has been a preferred solution for high-speed communication due to its advantages such as high speed, low latency, low packet loss probability, and remote direct memory access. It is widely used in applications such as server clusters and supercomputers in high-performance computing scenarios. However, IB networks also have some drawbacks, including high cost and challenges in maintenance, management, and scalability of IB switches. While IB networks perform well in small AI compute clusters, their scalability becomes a challenge in large-scale AI compute clusters.


RoCE networks generally exhibit slightly lower communication efficiency compared to IB networks. The specific difference depends on various factors, including network configuration, workload characteristics, and application requirements. However, in general, the relatively lower communication efficiency of RoCE networks is due to additional overhead caused by its protocol stack design and packet processing methods. In contrast, IB networks are purpose-built for high-performance computing and data center applications, providing lower latency and higher throughput.


Despite the lower communication efficiency, RoCE networks offer advantages in terms of cost and versatility. RoCE networks utilize Ethernet as the underlying transport medium, enabling deployment on existing Ethernet infrastructure, thereby reducing costs and deployment complexities. Consequently, RoCE networks remain a popular choice in certain scenarios such as AI compute clusters or situations that prioritize flexibility and cost-effectiveness.


With the continuous evolution of technology and evolving demands, these two technologies may demonstrate their respective strengths in different application scenarios. Whether it is InfiniBand or Ethernet, they will continue to drive the development and innovation of information technology, meeting the constantly increasing bandwidth demands and providing efficient data transfer and processing capabilities.


InfiniBand and RoCE Network Solution Provider-NADDOD 


NADDOD provides lossless network solutions based on InfiniBand and RoCE to create lossless network environments and high-performance computing capabilities for users. In the face of different application scenarios and user requirements, NADDOD can choose the optimal solution according to the specific situation, providing users with high bandwidth, low latency, and high-performance data transfer. This effectively addresses network bottleneck issues, enhancing network performance and user experience.


With the rapid development of technologies such as cloud computing, big data, and the Internet of Things, data transmission rates continue to increase. Traditional copper cable transmission faces bandwidth bottlenecks and signal attenuation issues, while fiber optic transmission has become the primary choice for modern communication due to its advantages of high bandwidth and low loss. NADDOD's 400G multimode optical modules/AOCs/DACs serve as high-performance fiber optic transmission devices, meeting the demands for high-speed transmission while offering stable and reliable solutions.


As technology continues to evolve and innovate, 400G multimode optical modules/AOCs/DACs are expected to lead the development in the networking field, providing robust support for the network demands of the digital era. As a professional module manufacturer, NADDOD produces 1G-800G optical modules, and we welcome everyone to learn more and make purchases.


related articles: