RoCE vs. InfiniBand: How to Choose for HPC Networking?

NADDOD Brandon InfiniBand Technical Support Engineer Jul 18, 2023

High-performance computing network platforms provide a solution to the problem of GPU-based programs in geological exploration requiring the use of an IB stack, while traditional TCP/IP stacks cannot support high-performance computing network communication. The ROCE v2 architecture solution is gradually being accepted by customers, with a growing ecosystem and mature applications. At the same time, network transmission efficiency and reliability have also been enhanced, and the operation of ROCE v2 technology has reduced host CPU consumption.

Before VS. After RoCE

HPC refers to the utilization of aggregated computing power to process data-intensive computational tasks that cannot be completed by standard workstations, such as simulations, modeling, and rendering required in exploration businesses. When dealing with various computational problems, we often encounter situations where a general-purpose computer cannot complete the work in a reasonable amount of time due to the need for a large amount of computation, or because the required amount of data is too large and the available resources are limited, making computation impossible.

 

HPC solves these limitations by using specialized or high-end hardware or by integrating the capabilities of multiple computing units. Data and operations are appropriately distributed to multiple units, requiring the introduction of parallel concepts. Different types of modeling problems have different levels of parallelism. For example, in parameterized scanning, this problem solves multiple similar models with independent geometries, boundary conditions, or material properties, which can be almost completely parallelized. The specific implementation method is to assign each model to a computing unit. Such problems are very suitable for parallel computing and are therefore usually referred to as "easily parallelizable problems". Parallel problems are very sensitive to network speed and latency in the cluster. (In other cases, due to insufficient network speed, communication cannot be effectively processed, which may lead to a slowdown.) Therefore, general-purpose hardware can be connected to speed up the calculation of such problems.

InfiniBand:Excellent Acceleration Performance, but Confronting the Cost Challenge

In traditional networks, the CPU consumption of the TCP/IP stack increases with the growth of network access bandwidth. HPC networks usually adopt RDMA technology to reduce the CPU consumption of the TCP/IP stack on compute nodes, thereby reducing network transmission latency.

 

RDMA allows for direct data transfer between the memory of two servers without the involvement of any CPU from either server (also known as zero-copy networking), enabling more efficient communication. This processing is performed on network interface cards (NICs) that support RDMA, bypassing the TCP/IP stack and accelerating data transfer. As a result, data can be directly transferred to remote memory on the target server, reducing the CPU and I/O workload on other servers for other tasks.

 

Traditional IB switch architecture leverages RDMA technology to provide HPC with a high-performance, low-latency network platform with the industry's minimum forwarding delay. However, Infiniband switches have their own independent architecture and protocols (IB protocol and specifications):

 

1.It is necessary to interconnect with devices that support the IB protocol.

 

2.The Infiniband system is relatively closed and difficult to replace.

 

3.Infiniband system requires a separate gateway to interface with traditional networks.

 

In an overall HPC computing platform, there are many applications that are not absolutely sensitive to latency. However, using expensive IB switch ports to carry a large number of these applications invisibly increases enterprise computing costs, maintenance costs, and management costs, and restricts the overall system expansion of HPC. Looking at the development trend of Ethernet networks in the industry based on the growth of 10G/25G/40G/100G bandwidth, as the scale of computing continues to expand, many existing networks based on IB need to be expanded in terms of bandwidth, port density, and other aspects. For non-latency-critical HPC applications, Ethernet is preferred to replace the existing IB switches to reduce costs.

RoCE: Reducing Costs,Maintaining Network Acceleration Performance

The RoCE specification implements RDMA functionality over Ethernet and requires a lossless network. RoCE's main advantage is its low latency, which can increase network utilization. At the same time, it can bypass TCP/IP and use hardware offloading, resulting in lower CPU utilization.

RDMA application

The RoCEv2 standard enables the transport of RDMA routing over Layer 3 Ethernet networks. The RoCEv2 specification replaces the InfiniBand network layer with IP and UDP headers on the Ethernet link layer. This makes it possible to route RoCE between traditional routers based on IP.

 

RoCE v1 Protocol: Leveraging Ethernet for RDMA, Limited to Layer 2 Networks. Its packet structure involves adding an Ethernet header to the existing IB architecture, identified by Ethertype 0x8915 for RoCE packets.

 

RoCE v2 Protocol: Leveraging UDP/IP for RDMA, Deployable in Layer 3 Networks. Its packet structure involves adding UDP, IP, and Ethernet headers to the existing IB architecture, identified by UDP destination port number 4791 for RoCE packets. RoCE v2 supports load balancing through source port number hashing, utilizing ECMP for improved network utilization.

 

This innovation can meet the growing demand for high performance and horizontally scalable architecture within enterprises. RoCEv2 enables the continuous convergence path and provides highly dense data centers, while also providing a fast migration solution for applications based on IB, reducing development workload and improving user efficiency in deploying and migrating applications.

RoCE for HPC Network

Many mainstream network vendors support RoCE network solutions. Take a certain vendor as an example, the typical solution uses CN12000 as the access core, forming three networks: computing network, management network, and storage network. In the computing network, high-density and high-forwarding are implemented, and key technologies such as RDMA are used in conjunction with hosts to smoothly migrate high-performance applications developed based on IB protocol to lower-cost Ethernet switching networks.

 

The support of high-performance network products greatly simplifies the high-performance network architecture and reduces the latency caused by multi-level architecture, providing strong support for the smooth upgrade of access bandwidth for critical computing nodes. By adopting the RoCEv2 standard as the core and supporting RoCEv2, DCE/DCB on computing nodes, the complexity and additional workload caused by program migration are eliminated, and the consumption of the TCP/IP stack on the host CPU by the computing node is reduced. With the support of technologies such as PFC/RoCE, the core network enables high-performance computing networks to have higher openness, reducing the cost of building the entire high-performance cluster platform without reducing computing efficiency.

Conclusion:Choosing Naddod InfiniBand or RoCE Solutions, building High-Performance &Lossless Network

The choice between InfiniBand and RoCE for building a high-performance, lossless network will depend on the specific requirements of your application and infrastructure. Both InfiniBand and RoCE are capable of providing low-latency, high-bandwidth, and low CPU overhead, making them suitable for high-performance computing applications.

 

Naddod provides lossless network solutions based on both InfiniBand and RoCE to help customers build high-performance computing capabilities and lossless network environments. Depending on different application scenarios and user requirements, Naddod can choose the optimal solution according to the actual situation, providing customers with high-bandwidth, low-latency, and high-performance data transmission. This effectively solves network bottleneck problems, improving network performance and user experience.

 

Naddod provides high-speed InfiniBand and RoCE products: including HDR/NDR 200G/400G or Ethernet 200G/400G,AOC & DAC and Optical Modules and other products, which have excellent performance and significantly improve customers' business acceleration capabilities at a low cost. Naddod always puts its customers first, constantly creating outstanding value for customers in various industries. Its products and solutions have won customers' trust and favor with high quality and excellent performance and are widely used in industries and key fields such as high-performance computing, data centers, education, research, biomedicine, finance, energy, autonomous driving, the internet, manufacturing, and operators. Naddod works closely with customers, providing them with reliable and efficient network technology to help them succeed in the digital era. Whether it is InfiniBand or RoCE, Naddod will be your trusted partner.