NADDOD Helped the National Supercomputing Center to build a General-Purpose Test platform for HPC - NADDOD Blog

HPC Case Study: Nebulae Supercomputing Center

NADDOD Abel InfiniBand Expert Feb 24, 2023

Highlights

Supports the 7x24 uninterrupted high-speed, safe, reliable, and stable operation of the supercomputing center to meet the rapid development and model innovation of the supercomputing center’s business.

Key Stat

  • More than 200 nodes for achieving 1.271 PFlop/s performance speed
  • 1E-15 ultra-low Bit Error Rate
  • 7x24 no downtime operation

Overview

Nebulae is one of the most advanced high-performance computing system in the world. In June 2010, it was ranked as the second strongest in TOP500 by the International Supercomputer Conference, with a performance speed of 1.271 PFlop/s, and reports an impressive highest theoretical peak capability among the TOP500, nearly 3 petaflops/s (2.98 petaflop/s). Nebulae is built from a Dawning TC3600 Blade system with Intel Xeon processors (X5650) and AMD or NVidia GPUs (Tesla C2050) used as accelerators. The National Supercomputing Shenzhen Center supports scientific and technological innovation in the fields of biomedicine, industrial simulation, astrophysics, fluid simulation, earth science, atmospheric science, geological simulation, deep learning, genomics, integrated circuit design, etc., contributing a lot in the development of the national high-performance computing (HPC) and cloud computing industry, and the improvement of competitiveness worldwide.
National Supercomputing Center in Shenzhen, China
National Supercomputing Center

Challenge

The supercomputing center is based on supercomputers and is mainly used to solve massive data processing, complex calculations and simulations that ordinary computers and servers cannot complete.

The National Supercomputing Center is oriented to typical high-performance computing (HPC) scenarios and is designed based on massively parallel computing. Compared with ordinary computing, high-performance computing enables users to solve complex artificial intelligence, biological Technology and scientific computing issues, to achieve accelerated breakthroughs in scientific research and technological innovation.

In the era of high-performance computing (HPC), computing is the power and the network is the foundation. Applications used in high-performance computing require powerful computing capabilities, high bandwidth, and enhanced networks. Faced with the growing business needs of the National Supercomputing Center, there are extremely high requirements for platform reliability and business availability. At the same time, the business characteristics of the National Supercomputing Center for the cutting-edge technology have extremely high requirements on the maturity, qualification, technology and operation and maintenance capabilities of networking solution suppliers.

Solution

In order to further develop the resource capabilities of the supercomputing center and strengthen supercomputing applications, the National Supercomputing Center has built a General-Purpose Testing Platform (GTP) to deepen scientific research innovation and industrial application cultivation, and continuously improve the quality of technical services.

The GTP project mainly includes two parts: General-Purpose test server and high-performance network. For high-performance network clusters, the ratio of network switches, network cards, and optical connectivity assemblies is about 5:7:16 (taking 240 100G node clusters as an example), it can be seen that the number and price of optical connectivity assemblies account for the largest proportion, and have surpassed the sum of that of the network cards and switches.

NADDOD, with our deep business understanding and rich project implementation experience in high-performance network construction and application acceleration, provided a high-performance computing network solution for the national supercomputing center.
Nebulae National Supercomputing Center Host Room
Nebulae National Supercomputing Center Host Room

In this project, the national supercomputing center has undergone a large number of performance tests in the early stage, and at the end it has been verified that the NADDOD IB active optical fiber cables (AOC) fiber cables have perfect adaptability to its network cluster, and NADDOD products’ parameters and performance are far superior to other manufacturers, fully meeting the technical requirements of the high-performance network of the national supercomputing center.
NADDOD High-Performance Network Solution Topology
NADDOD High-Performance Network Solution Topology

Optical connectors under high-performance networks have three key indicators: bit error rate (BER), power consumption and compatibility.

  • The bit error rate requirement of such a high-performance network is 1E-15, which is 3 orders of magnitude higher than that of Ethernet.
  • Secondly, in terms of power consumption, heat dissipation under high-performance networks is a big challenge. And there are also strict requirements on the power consumption of the optical fiber cables.
  • Finally, there is the issue of compatibility. The power-on sequence of active optical cables from different manufacturers is different, and high-performance switches have strict requirements on the power-on sequence.

NADDOD has many years of R&D innovation and technical experience in the control of the placement accuracy of optical cables and transceivers, the matching of high-speed signals and impedances, the heat dissipation design of high-speed products, and the coupling of optical paths. In this project, the HPC IB HDR 200Gb/s QSFP56 AOC optical cables from NADDOD fully met the high-performance network transmission requirements of the GTP in terms of parameter performance, and efficiently supported the supercomputing center to provide on-demand high-quality services 7x24.

Results

NADDOD high-performance network products are deployed to the key positions of the national supercomputing center to support the high-speed, safe, reliable, and stable operation of the supercomputing center system, to meet the rapid development and model innovation of the supercomputing center’s business. At the same time, building the General-Purpose Test Platform for high-performance computing is one more practical verification of NADDOD’s industry-leading R&D, manufacturing and technical service capabilities. Our quality products and excellent performance won the trust and favor of the world-top supercomputing center.

Related Resources:
What Is InfiniBand and How Is It Different from Ethernet?
NADDOD High-Performance Computing (HPC) Solution
Supercomputing Industry HPC Cluster Introduction
NADDOD InfiniBand Cables & Transceivers Products
Why Is InfiniBand Used in HPC?
InfiniBand Network Technology for HPC and AI: In-Network Computing
InfiniBand Trend Review: Beyond Bandwidth and Latency
Why Autonomous Vehicles Are Using InfiniBand?
Active Optical Cable Jacket Explained: OFNR vs OFNP vs PVC vs LSZH?
What Is InfiniBand Network and Its Architecture?