Optical Modules for AI Computing: Naddod's 800G/400G Production Line - NADDOD Blog

Optical Module Products for AI Computing

NADDOD Nathan Optics Application Engineer Jan 12, 2024

The widespread adoption of AI large-scale models, represented by ChatGPT, will drive a rapid increase in computational power demand. In this process, the server industry chain will become a crucial beneficiary. With the proliferation of AI computing, the demand for optical modules in the server industry chain is expected to upgrade to 800G, enabling faster data transmission rates. This signifies that continuous innovation and progress in optical module technology are necessary to meet the ever-growing computational power requirements. As AI applications continue to advance, the server industry chain plays a pivotal role in supporting this technological evolution, with optical module advancements being a vital component in enabling efficient and high-speed data processing.

 

1. Forecast for Optical Module Market Demand Driven by Computing Network

Optical modules are essential components for interconnecting data centers internally and connecting data centers to each other. Currently, the mainstream products in the market are 100G and 400G modules, while 800G modules have primarily been used in fields such as supercomputing. According to LightCounting's forecast, the global adoption rate of 800G modules is estimated to be only 0.62% by 2023. However, AI large-scale models like ChatGPT, which have emerged as representatives in the field, are placing new demands on data flow both within and outside data centers. This development is expected to drive the accelerated adoption of 800G optical modules. By the end of 2025, it is projected that 800G optical modules will dominate the market, leading the way in optical module technology.

 

Optical module segment market size forecast

According to LightCounting data, the global optical module market witnessed steady growth from $5.86 billion in 2016 to $6.67 billion in 2020. It is projected that the global optical module market will reach $11.3 billion by 2025, representing a 1.7-fold increase compared to 2020. In terms of market segmentation, the data communications market holds a dominant position, constituting approximately 60% of the market, while the telecommunications market accounts for around 40%. This data highlights the increasing significance of optical modules in supporting the expanding data communications and telecommunications sectors, driven by factors such as the proliferation of AI applications, cloud computing, and the demand for high-speed data transmission and inter-connectivity.

 

Global optical module market and forecast

2. Application of Optical Modules in AI computing

Let's take NVIDIA's SuperPOD as an example to calculate the ratio of GPUs to optical modules. This calculation only considers the demand for optical modules corresponding to the InfiniBand (IB) network. We'll use a cluster with 140 nodes as an example, and each server in the cluster requires 8 GPU chips. So, the total number of chips needed is 140 × 8 = 1120. The cluster is divided into 7 scalable units (SUs), with 20 servers in each SU.

 

In the IB network architecture, a complete fat-tree topology is implemented. The optimal configuration for the fat-tree architecture in training scenarios is to have an equal number of uplink and downlink ports, creating a non-blocking network. Specifically:

 

  • First layer: Each SU is equipped with 8 leaf switches, totaling 56 leaf switches.

 

  • Second layer: Every 10 leaf switches form a Spine Group (SG). The first leaf switch of each SU is connected to every switch in SG1, and the second leaf switch of each SU is connected to every switch in SG2. There are a total of 80 spine switches.

 

  • Third layer: Every 14 core switches form a Core Group (CG), resulting in 28 core switches.

 

For the corresponding calculation and storage-side cables, all active optical cables (AOC) are used. Therefore, each port corresponds to one optical module, and each optical cable corresponds to 2 optical modules. Hence, the total number of optical modules required for the calculation and storage sides is calculated as follows: (1120 + 1124 + 1120) × 2 + (280 + 92 + 288) × 2 = 8048. This means that the ratio of 200G optical modules to a single GPU is approximately 1:7.2.

 

The DGX GH200 supercomputer is equipped with 256 Super Chips, with each Super Chip considered as a server interconnected through switches. Structurally, the supercomputer adopts a two-layer fat-tree topology with 96 switches in the first layer and 36 switches in the second layer. Each switch has 32 ports with a speed of 800G. Additionally, the supercomputer is equipped with 24 IB switches for the IB network. Estimating based on the number of ports, assuming that copper cables are used for connections in the L1 layer and do not involve optical modules, in the L2 layer, under the non-blocking fat-tree architecture, the second-layer switch ports are connected to the uplink ports of the L1 layer switches. Therefore, a total of 36 × 32 × 2 = 1152 800G optical modules are required. For the IB network architecture, the 24 switches require 24 × 32 = 768 800G optical modules. Thus, the DGX GH200 supercomputer requires a total of 1152 + 768 = 1920 800G optical modules, corresponding to 12 800G optical modules per chip. Therefore, GPT-3 requires over 80,000 200G optical modules for daily training. Considering a utilization rate of 20%-30% for FLOPS, the required number of optical modules would be approximately 350,000.

 

3. Naddod optical module used in AI computing power

AI development has driven the demand for a massive number of optical modules. As a professional optical module supplier, Naddod has completed the construction of its 800G/400G AI application optical module production line in 2023. The newly established production line is mainly dedicated to the manufacturing of 4×100G PAM4 and 8×100G PAM4 optical modules and active optical cables.

 

Infiniband Optical Transciver NADDOD

Thanks to the support of the supply chain, NADDOD's series of AI optical modules have been mass-produced smoothly. The company's own product diversification design capabilities, order capabilities and supply chain capabilities have formed a good matching relationship. It can provide flexible and effective demand delivery and sample testing for large, medium and small customers.

 

Ethernet Optical Transciver NADDOD

NADDOD have abundant and stable inventory, ensuring fast delivery. We have successfully delivered cooperative products to multiple enterprises, guaranteeing a two-week delivery time after placing an order. Moreover, every product is thoroughly tested on real devices, including scenarios with tens of thousands of simultaneous applications, to ensure smooth operation in real-world applications. Our optical modules have even lower bit error rates than those from original equipment manufacturers (OEMs), and their power consumption and loss are superior to those of products available on the market. With numerous successful delivery and real-world application cases, you can trust the quality of our products and our inventory. In addition to offering high-quality third-party optical modules, we also have sufficient stock of original NVIDIA products. Feel free to inquire for more information.