AI Computing Power Boosts 800G Optical Transceiver Demand Growth

NVIDIA's stock price has risen by over 200% this year, which indicates the significant demand for computing hardware like GPUs under AI computing power requirements. In addition to GPUs, AI server clusters in data centers also have high-speed data transmission requirements for AI training. As data center scales increase, the usage of optical transceivers has experienced exponential growth with the emergence of new architectures.

In the past decade, the speed of optical transceivers has rapidly developed. Before 2015, data center optical transceivers were generally based on 10G and 40G. Starting in 2016, 25G and 100G optical transceivers were deployed, and by 2019, 100G optical transceivers were widely used, with the shipment of 200G and 400G products beginning. By 2022, 200G and 400G products had been extensively deployed, and 800G optical transceivers entered mass production and adoption phase.

Why do we need 800G optical transceivers?

First and foremost, optical transceivers are primarily used for data transmission in data centers. In the field of data centers, the data transmitted between external users and internal servers is referred to as north-south traffic, while the data transmitted between data centers and between data centers and internal servers is known as east-west traffic.

With the increase in data center capacity, it is projected that east-west traffic will account for 85% of the overall data center traffic in 2021. Among them, internal server traffic within the data center will account for 71.5%, and inter-data center traffic will account for 13.6%. However, it is important to note that this prediction was made before the AI large model boom ignited by ChatGPT at the end of 2022. Therefore, the proportion of east-west traffic and internal server traffic within data centers is expected to further increase.

As data center computing scales and east-west traffic continue to expand, data center network architectures are constantly evolving. In traditional three-tier topology, data exchange between servers requires passing through access switches, aggregation switches, and core switches, which puts significant pressure on aggregation and core switches.

3-Tier 2-Tier Architecture

If the server cluster scale continues to expand according to the traditional three-tier topology, it will require the deployment of highly performant devices at the core and aggregation layers, resulting in a significant increase in equipment costs. This is where a new leaf-spine topology comes into play, which flattens the traditional three-tier topology into a two-tier architecture.

In this architecture, leaf switches serve as access switches in the traditional three-tier architecture and directly connect to servers. Spine switches act as core switches, but they are directly connected to leaf switches, and each spine switch needs to be connected to all leaf switches.

The number of downlink ports on the leaf switches determines the number of leaf switches, while the number of uplink ports on the leaf switches determines the number of spine switches. Together, they determine the scale of the leaf-spine network.

The leaf-spine architecture significantly improves the efficiency of data transmission between servers and enhances the scalability of data centers by simply increasing the number of spine switches when the number of servers needs to be expanded. The only drawback is that, compared to the traditional three-tier topology, the leaf-spine architecture requires a significantly larger number of ports. Consequently, both servers and switches require more optical transceivers for fiber optic communication.

For AI training heavily reliant on GPUs, in NVIDIA's DGX H100 servers, which integrate 8 H100 GPUs, the demand for each compute node (including the corresponding switch side) amounts to approximately 8 800G OSFP optical transceivers.

The exponential increase in the number of high-speed optical transceivers required in data centers under the leaf-spine architecture, particularly driven by the demand for AI large model training, and the higher transmission rate requirements of GPU servers, has accelerated the adoption of 800G optical transceivers.

Summary

In the context of the certainty surrounding GPU order demands, 800G optical transceivers will enter the phase of large-scale shipments starting from the second half of this year. Serving as a bridge for AI computing power, the deployment of 800G optical transceivers will continue to accelerate alongside the expansion of data center scales, the continuous growth of AI training demands, and the expansion of the market.

As a leading provider of comprehensive optical networking solutions, NADDOD is committed to delivering innovative computing and networking solutions, providing timely access to higher-speed and higher-quality 800G optical transceivers, and consistently delivering excellent products, solutions, and technical services for applications in data centers, high-performance computing, edge computing, artificial intelligence, and other fields, thereby co-creating a digitally intelligent world of interconnected things.