AI Data Center Network Architecture Requirements- 400/800G Optical Module

NADDOD Peter Optics Technician Apr 26, 2024

With the continuous development of AI technology and its related applications, the importance of large models, big data, and AI computing capabilities in AI development is becoming increasingly prominent. Large models and datasets constitute the software foundation of AI research, while AI computing power serves as the crucial infrastructure. In this article, we will explore the impact of AI development on data center network architecture.

 

Fat-Tree Data Center Network Architecture

 

With the widespread application of AI large model training across various industries, traditional networks are unable to meet the bandwidth and latency requirements of large model cluster training. Distributed training of large models requires communication between GPUs, and its traffic pattern differs from traditional cloud computing, increasing the east-west traffic of AI/ML data centers. Short-term and high-volume AI data lead to network latency and training performance degradation in traditional network architectures. Therefore, the emergence of Fat-Tree networks is inevitable to meet the demands of short-term and high-volume data processing.

 

In traditional tree-like network topologies, bandwidth converges layer by layer, with the bottom network bandwidth far less than the total bandwidth of all leaf nodes. In contrast, Fat-Tree resembles a real tree, with thicker branches closer to the root. Thus, network bandwidth gradually increases from leaf to root, improving network efficiency and accelerating the training process. This is the fundamental premise of the Fat-Tree architecture, which enables non-blocking networks.

 

AI-FAT TREE

Evolution of Data Center Network Speed Upgrades

 

As the complexity of data center applications continues to increase, so does the demand for network speed. From the past 1G, 10G, and 25G to the widely used 100G today, the pace of data center network upgrades and evolution is accelerating. However, facing large-scale AI workloads, 400G and 800G transmission rates have become the next key processes in the evolution of data center networks.

 

DateCenter Speed

 

AI data centers drive the development of 400G/800G optical modules

 

With the increasing demand for large-scale data processing in AI, the demand for 400G/800G optical modules is also rapidly rising. The main reasons include:

 

  1. Large-scale data processing demand: AI algorithms for training and inference require processing large datasets. Therefore, data centers must be able to efficiently handle the transmission of large amounts of data. The emergence of 800G optical modules provides greater bandwidth, helping to address this issue. Upgraded data center network architectures typically include two tiers, extending from switches to servers, with 400G serving as the base layer. Therefore, upgrading to 800G also drives the growth in demand for 400G.

 

  1. Real-time demand: Real-time data processing is crucial in certain AI application scenarios. For example, in autonomous driving systems, the large volume of data generated by sensors needs to be quickly transmitted and processed to ensure timely responses. The introduction of high-speed optical modules quickly meets these real-time demands by reducing the latency of data transmission and processing, thereby improving system responsiveness.

 

  1. Concurrency of multiple tasks: Modern AI data centers typically need to handle multiple tasks simultaneously, including activities such as image recognition and natural language processing. Adopting high-speed 800G/400G optical modules can enhance support for this multitasking workload.

 

  1. Broad market prospects for 400G/800G optical modules: Currently, the demand for 400G and 800G optical modules has not shown significant growth, but it is expected to see a significant increase by 2024 driven by the growth in AI computing demand. According to Dell'Oro forecasts, the demand for 400G optical modules is expected to increase in 2024. The growing demand for high-speed data transmission driven by AI, big data, and cloud computing is expected to accelerate the growth of the 800G optical module market. This trend highlights the bright prospects of the 800G/400G optical module market, as their applications will gradually increase in response to the changing demands of advanced computing applications.

 

400G800G market

 

NADDOD offers high-quality 400/800G Optical Modules, with a large inventory available for rapid delivery, featuring high-quality Broadcom VCSELs and cost-effective customized network connectivity solutions.

 

400G/800G Optical Module Solution for a Typical Data Center

 

The diagram illustrates the solution for upgrading to an 800G data center. The QDD-FR4-400G optical module forms a high-bandwidth link between the MSN4410-WS2FC switch in the backbone layer and the high-performance 800G switch in the core layer, operating at a 400G interface speed.

 

These optical modules, utilizing high-density QSFP-DD packaging, can be deployed in high-density configurations. This increases transmission capacity and provides larger bandwidth rates. Additionally, by employing PAM4 modulation and re-timing techniques, these optical modules achieve faster data transmission rates while significantly reducing latency, thereby improving overall system performance.

 

DateCenter solution

 

The New Era of 800G/400G Optical Modules

 

With the continuous growth in demand for faster and more efficient data transmission, the era of 800G/400G optical modules has fully arrived. These optical modules are highly favored for their outstanding bandwidth capabilities, advancements in LPO technology, and economic benefits. They are poised to revolutionize the AI field and redefine data centers. Utilizing high-speed optical modules, fully developing and training AI is no longer just a concept.