Full-Stack Solution of Spectrum-X Network Platform

NADDOD Abel InfiniBand Expert Sep 11, 2023

1. NVIDIA Spectrum-4 Ethernet Switch

The Spectrum-4 switch, based on a 51.2Tbps ASIC, supports up to 128 400GbE ports in a single 2U switch. Spectrum-4 is the first switch designed specifically for AI workloads, combining a purpose-built high-performance architecture, ultra-low latency, and standard Ethernet connectivity.

 

Spectrum-4 offers AI-specific enhancements for RoCE (RDMA over Converged Ethernet) with unique features:

 

  • RoCE Dynamic Routing
  • RoCE Performance Isolation
  • Efficient bandwidth in large-scale network deployments using standard Ethernet
  • Low latency with low jitter and short tail delay.

NVIDIA Spectrum-4 400G Ethernet SwitchNVIDIA Spectrum-4 400G Ethernet Switch

2. NVIDIA BlueField-3 DPU

NVIDIA® BlueField®-3 DPU is the third-generation on-chip data center infrastructure that enables enterprises to build software-defined, hardware-accelerated IT infrastructure from the cloud to the core data center and to the edge. With 400Gb/s Ethernet connectivity, BlueField-3 DPU offloads, accelerates, and isolates software-defined networking, storage, security, and management functions, significantly improving data center performance, efficiency, and security. BlueField-3 DPU supports multi-tenancy and security, addressing critical requirements for handling north-south and east-west traffic in Spectrum-X empowered cloud AI data centers.

NVIDIA BlueField-3 400G Ethernet DPUNVIDIA BlueField-3 400G Ethernet DPU

BlueField-3 DPU is purpose-built for AI acceleration, equipped with integrated all-to-all engine, NVIDIA GPUDirect®, and NVIDIA® Magnum IO GPUDirect® Storage (GDS) acceleration technologies tailored for AI workloads. Additionally, BlueField-3 DPU features a special NIC mode that leverages onboard memory for accelerating large-scale AI clouds. These clouds employ a significant number of Queue Pairs (QPs) that can be supported using DPU's local memory resources, avoiding the use of system memory resources. Lastly, BlueField-3 DPU utilizes NVIDIA Direct Data Placement (DDP) technology to enhance RoCE dynamic routing.

3. NVIDIA End-to-End Physical Layer (PHY)

Spectrum-X is the only Ethernet network platform that utilizes the same 100G SerDes technology across its end-to-end channels, from switches to DPUs to GPUs. NVIDIA SerDes ensures exceptional signal integrity and ultra-low BER (Bit Error Rate), significantly reducing power consumption in AI clouds. The powerful SerDes technology is employed in NVIDIA Hopper GPUs, Spectrum-4 switches, BlueField-3 DPUs, and the Quantum InfiniBand product line, delivering unmatched energy efficiency and performance. SerDes technology plays a crucial role in modern data transmission by enabling the conversion between parallel and serial data.

 

Utilizing standardized SerDes technology across all network devices and components in a network or system offers several advantages:

 

  • Cost and energy efficiency: NVIDIA SerDes used in Spectrum-X is optimized for energy efficiency and eliminates the need for gearboxes, which are used to bridge speed differences in channels. The use of gearboxes not only adds complexity to the data path but also increases additional costs and power consumption. Eliminating the need for these gearboxes reduces initial investment and operational costs associated with power and cooling.

 

  • System design efficiency: Unified and advanced SerDes technology in data center infrastructures enables higher signal integrity, reducing the required system components and simplifying system design. Unified and consistent SerDes technology also simplifies operations and extends uptime.

4. NVIDIA Acceleration Software

NVIDIA NetQ Telemetry DashboardNVIDIA NetQ Telemetry Dashboard

4.1 NetQ

NVIDIA NetQ™ is a highly scalable network operations toolkit for real-time AI network visualization, troubleshooting, and validation. Leveraging the 'NVIDIA What Just Happened' switch telemetry data and NVIDIA® DOCA™ telemetry, NetQ provides actionable insights into the operation of switches and DPUs, integrating the network into an enterprise's MLOps ecosystem. Additionally, NetQ flow telemetry maps flow paths and behaviors within switch ports and RoCE queues to analyze specific application flows. NetQ samples, analyzes, and reports latency (maximum, minimum, and average) per switch and provides detailed buffer occupancy information in flow paths. The NetQ GUI reports all possible paths, detailed information for each path, and flow behavior. Combining 'What Just Happened' with flow telemetry helps network operators proactively identify the root causes of server and application issues.

4.2 Spectrum Software Development Kit

The NVIDIA Ethernet Switch Software Development Kit (SDK) offers mature programmability, enabling flexible implementation of any switching and routing functionality while delivering exceptional performance in packet rate, bandwidth, and latency. Server, network OEMs, and network operating system vendors can leverage this SDK to build flexible, innovative, and cost-optimized switching solutions on the Ethernet switch series chips.

4.3 NVIDIA DOCA

NVIDIA® DOCA™ is the key to unlocking the potential of NVIDIA BlueField® DPU for offloading, accelerating, and isolating data center workloads. With DOCA, developers can create software-defined, cloud-native, DPU-accelerated services with zero-trust protection, pro-grammatically shaping the future of data center infrastructure to meet the growing performance and security demands of modern data centers.

5. NADDOD offers Spectrum-X Connectivity Products

The Spectrum-X Ethernet platform explores innovative and efficient Ethernet solutions for accelerating artificial intelligence, high-performance computing, and cloud computing. The typical Spectrum-X network topology, as shown in the diagram below, demonstrates the need for a large number of optical connectivity components to achieve interconnection between devices. This is aimed at maximizing the performance of high-performance computing networks by providing high-bandwidth, low-latency, and highly reliable connections between Ethernet elements. Therefore, high-quality and reliable optical connectivity products are crucial when deploying the Spectrum-X network platform.

Typical Spectrum-X network topology

Typical Spectrum-X Network Topology

As a leading provider of optical network solutions, NADDOD offers optical connectivity + networking products and solutions. We have a deep understanding of the business and extensive experience in project implementation. NADDOD focuses on high-performance network construction and application acceleration, providing the best combination of high-performance switches, intelligent network cards, and AOC/DAC/optical module products and solutions based on your specific application scenarios. With technical expertise and project experience in optical networks and high-performance computing, we continuously provide excellent products, solutions, and technical services for applications such as data centers, high-performance computing, edge computing, and artificial intelligence, contributing to the creation of a connected and intelligent world.

 

When deploying your Spectrum-X network, NADDOD can provide a comprehensive range of optical connectivity products with multiple speeds ranging from 10G to 400G. These products undergo rigorous performance testing and compatibility testing, offering exceptional performance and perfect compatibility with Spectrum-X network devices. By choosing NADDOD's high-quality and reliable optical connectivity products, you can achieve outstanding performance and reliability for your AI business deployed on the Spectrum-X network platform, enabling the rapid growth of your business!