How to face the design challenges of 800G Ethernet?
With the increase in the number of users and devices, the improvement of access rates, the diversification of access methods, and the emergence of more diverse service demands, high-performance computing is gradually entering application fields such as AI, automation, and device packaging. New ways of using and processing complex data are also emerging.
Due to yield and cost issues, as heterogeneous integration and chiplet become popular, some use cases for 800G are emerging. The design challenges of 800G will impact chip design engineers in many aspects.
Adoption of 800G Ethernet is Expected to Double
NADDOD found in the IEEE 802.3 Ethernet Bandwidth Evaluation Report that 800G and 1.6T will achieve a doubling and quadrupling of growth respectively within 9 years. The most significant application area with increased bandwidth demand is data center data exchange, which has grown 16.3 times in 8 years. The following table shows the growth values:
Short-distance connections within 5 meters in data center rack units (RU) are mainly using copper wires, while optical devices are used in other locations. In the RU, the speed of switches will increase from 12.8T to 25.6T, 51.2T, and 102.4T. The same speed changes can be seen in pluggable and co-packaged optical devices, which will grow from 400-800G to 1.6T and beyond. Previously, a 12.8T switch required the instantiation of 32 8 x 50G SerDes. However, the next-generation switches will feature 112G or even 224G SerDes, which offer advantages such as smaller area, lower cost, lower power consumption, shorter time to market, and faster speeds.
The IEEE 802.3 working group has defined the 400G standard, while the Ethernet Technology Alliance has defined and released the higher-speed 800G standard. The 400G IEEE 802.3 standard uses multi-lane distribution (MLD) technology to distribute data from a single media access control (MAC) channel to 16 physical coding sublayer (PCS) channels. The 800G standard from the Ethernet Technology Alliance uses a MAC extended to 800 Gb/s and two modified 400G PCS to drive 8x100G channels. The two PCS together have 32 channels (2 x 16 for the 400G standard's PCS) and utilize the forward error correction (FEC) technology supported by the 400G standard with RS (544,514) coding.
Ethernet Layer and Configuration for 400G/800G Data Rates
A complete Ethernet IP subsystem consists of a PHY and MAC. The PHY consists of PCS + SerDes, where SerDes includes PMA and PMD, as shown in the diagram below. An Ethernet IP subsystem compatible with IEEE 802.3 covers a wide range, from simple systems with 100G MAC/PCS and 50G SerDes to more complex 800G Ethernet subsystems with multiple MAC/PCS (with different configurations) and 56G/112G SerDes.
From the architecture view shown in the diagram, Ethernet resides in the bottom two layers of the seven-layer Open Systems Interconnection (OSI) model: the physical layer and the data link layer.
Ethernet Layer in the Open Systems Interconnection (OSI) Model
The physical layer (including PMD, PMA, and PCS) sends and receives unstructured raw bit streams on the physical medium. Functions such as serialization, auto-negotiation, and link training are implemented at the physical layer. PMD can handle media ranging from short-distance cables to long-distance interconnects between backplanes and optical fibers. It is a medium-dependent serial interface that performs bit timing and signal encoding. The next sublayer above PMD is PMA, and the rate and number of channels can be configured for each channel. Additionally, PMA performs local and remote loopback tests, as well as data framing and test pattern generation.
High-speed SerDes (composed of PMA and PMD) are typically 56G or 112G and can be configured in 1/2/4 channel configurations as x1/x2/x4 SerDes. Low-speed SerDes can be used for 10G, 25G, and 32G PHY.
PCS transfers information to the MAC or other PCS clients (such as repeaters) or receives information from the MAC or other PCS clients. PCS performs data frame delineation, encoding/decoding (such as 8b/10b or 64b/66b), fault indication transmission, removal of received data offsets, and data recovery.
High-speed PCS typically supports data rates of 200G/400G/800G, while the rate range for low-speed PCS is from 1G to 100G. High-speed PCS usually has a configurable number of channels, and each channel can operate independently at different rates. For example, a 400G PCS can have any of the following configurations:
- 400G, 8x50G SerDes
- 2x200G, 4x50G SerDes
- 2x200G, 8x25G SerDes
Data Link Layer
The data link layer (including the MAC layer and Logical Link Control (LLC) layer) provides direct interconnection for data transfer between nodes. In addition to flow control, the MAC layer also handles data error correction from the physical layer. The MAC layer also provides data rates of 200G/400G/800G as well as lower speeds ranging from 10M to 100G. The MAC configuration options also map to the PCS configuration options mentioned above.
Considerations for Ethernet Configuration Design
From the number of options mentioned above, it is evident that the use cases for Ethernet are complex and diverse. For example, for a 51.2T Ethernet switch operating at 100 Gbps line rate, we find that Ethernet can be implemented in at least three different configurations, as shown in the diagram.
Configuration 1 - Single Chiplet Solution:
This involves 512 100G SerDes channels placed on all edges of a single chiplet, utilizing 128 instances of x4 112G Long-Reach (LR) SerDes and a four-lane or eight-lane PCS & MAC. Factors to consider include available edges and possible layout planning to ensure optimal routing, MAC/PCS layout, and feasibility of global timing convergence.
Configuration 2 - Dual Chiplet Solution:
This is a dual-chiplet implementation connected by 112G Extra Short-Reach (XSR) SerDes. Each chiplet includes 64 800G subsystems, with a single subsystem integrating x4 112G LR SerDes and a four-lane or eight-lane PCS & MAC. The advantage of a multi-chiplet implementation is increased available edges, and each chiplet has a higher yield compared to a single chiplet solution.
Configuration 3 - Chiplet-on-Substrate Solution:
This involves connecting the main chiplet with 112G XSR SerDes to eight accompanying chiplets. Each accompanying chiplet consists of 16 x4 112G LR SerDes instances and a four-lane or eight-lane PCS & MAC. The advantage is that the main chiplet can use a more advanced process node, while the accompanying chiplets can use older but more mature process nodes.
For Configuration 3, whether the chiplet is 1.6T (4 instances of 32 x4 112LR), 3.2T (4 instances of 16 x4 112LR), or 6.4T (4 instances of 8 x4 112LR), various transceiver partitioning strategies need to be explored. Additionally, considerations for reference clock routing are also necessary.
Additionally, it is essential to consider package signal layout and routing studies to meet crosstalk specifications, establish power supply networks, and perform power integrity simulations, all of which are necessary to ensure consistent performance across chiplets.
In addition to the factors mentioned above, hardening is another crucial consideration. Hardening involves hypothetical analysis of transceiver partitioning to optimize chiplet edge utilization and includes front-end and back-end integration work using a complete design flow from RTL to GDS. A comprehensive design flow requires knowledge of SerDes, PCS, and MAC designs and close collaboration with EDA tools to meet sign-off standards.
To improve efficiency, simplify design work, and reduce time to market, designers need to use integrated and validated 400G/800G MAC, PCS, and 56G/112G SerDes. If integration is performed by designers with the required knowledge and expertise in MAC, PCS, and SerDes functionality, configuration, and implementation, interface latency and power optimization become more straightforward.
In this regard, NADDOD recognizes that an integrated 200G/400G/800G Ethernet solution composed of MAC, PCS, and PMA/PMD IP can meet the requirements. The MAC complies with IEEE standards and is configurable to suit the needs of high-performance computing (HPC), AI, and network SoCs. The DesignWare® 56G and 112G PHY IP have undergone silicon validation and are suitable for various advanced FinFET processes, providing excellent bit error rate (BER) performance while maximizing performance.
In the case of 800G Ethernet IP, the 800G optical transceiver plays a vital role. It is one of the key components used for high-speed data transmission, offering high bandwidth and low-latency fiber optic connections. As a leading provider of optical network solutions, NADDOD has top-notch capabilities in research and development, manufacturing, and technical services, giving it a competitive advantage in the field of 800G high-speed optical transceiver products. This includes three types of optical transceivers: 800G OSFP DR8+, 800G OSFP 2xFR4, and 800G OSFP DR8, all of which are advanced fiber optic transceivers used for high-performance data transmission and communication. Their shared advantages include:
- High-speed transmission: NADDOD supports high-speed data transmission of 800Gbps, meeting the demands of large-scale data centers and high-bandwidth applications.
- Low latency: NADDOD provides low-latency data transmission, helping accelerate data processing and communication speeds.
- High density: NADDOD employs high-density designs, offering more ports and connectivity options, achieving higher data throughput and better system scalability.
- Low power consumption: NADDOD's 800G optical transceivers feature low power consumption, contributing to reduced energy consumption and operational costs in data centers.
- Multi-application support: NADDOD's 800G optical transceivers are suitable for various application scenarios, including long-distance transmission, medium-distance transmission, and high-density data center deployments.
With a customer-centric approach, NADDOD continues to create excellent value for customers from various industries. In 2023, NADDOD launches the 800G series of optical transceiver products and solutions for ultra-large-scale cloud data centers, catering to the needs of users in 800G Ethernet designs on a global scale.