RDMA over Converged Ethernet VS RDMA

NADDOD Gavin InfiniBand Network Engineer Jul 12, 2023

RDMA technology has attracted attention for its excellent performance and low latency. RoCE technology combines RDMA with Ethernet to achieve high-performance communication on existing infrastructure. This blog will explore the advantages and application areas of RDMA and RoCE technology, helping to build superior high-performance networks.

Wide application of RoCE

Since the emergence of Ethernet, its open and diverse ecosystem, high-speed growth rate, and significant cost advantages have made it widely used in the industry. Among the various technology paths of RDMA, RoCE technology has the most extensive application. In the world-renowned high-performance computing ranking Top500, Ethernet interconnect technology accounts for more than half.

RoCE technology combines RDMA with Ethernet, making it possible to achieve low-latency and high-bandwidth data transmission on Ethernet through the use of special network adapters and switches. The emergence of RoCE technology not only makes RDMA technology easier to deploy and use, but also fully utilizes existing Ethernet infrastructure, providing users with high-performance network communication solutions.

Analyzing the Limitations of RoCE

Due to the performance bottlenecks of traditional Ethernet networks, general RoCE applications still suffer from performance losses such as congestion, packet loss, and latency jitter in high-performance businesses, making it difficult to meet the demands of high-performance computing and storage.

In high-performance storage clusters, FC networks have connection-oriented technologies that are insensitive to network upgrades and process failures. At the same time, the long frame header of the FC protocol provides transmission function guarantees with low protocol overhead, no packet loss, sequentially transmitted data frames, and a reliable, low-latency network. Compared to FC, traditional Ethernet is prone to congestion and packet loss, with the resending of lost packets easily leading to data disorder. The Ethernet network also experiences large amounts of jitter, and its store-and-forward mode leads to complex lookup processes and high forwarding latency. In multi-cast scenarios, queues can become congested, with queue latency that cannot be ignored.

In HPC applications, traditional Ethernet has weaker message encapsulation capabilities and complex lookup processes that lead to high forwarding latency. Transmission losses in the network can cause processors to idle while waiting for data, thereby dragging down overall parallel computing performance. According to test results from the ODCC in 2017, traditional Ethernet compared to specialized networks showed a performance difference of up to 30% in supercomputing cluster applications.

Exploring the birth and development of RDMA

With the improvement in storage and computing performance, the access latency between the two in data centers has been optimized from 10ms to the level of 20us, which represents an improvement of nearly a thousand times. However, if the network transmission mechanism based on the TCP protocol is still used at this point, the network latency will still be maintained at the level of milliseconds due to the packet loss and retransmission mechanism of TCP, which cannot meet the requirements of highperformance computing and storage for latency. At this point, the emergence of RDMA technology provides a new technological approach for improving network performance.

RDMA is a concept that uses DMA when two or more computers communicate with each other, allowing direct access from the memory of one host to the memory of another. In the packet processing of traditional TCP/IP technology, it needs to go through the operating system and other software layers, which requires a large amount of server resources and memory bus bandwidth. Data is copied and moved back and forth between system memory, processor cache, and network controller cache, which imposes a heavy burden on the server's CPU and memory. In particular, the severe "mismatch" between network bandwidth, processor speed, and memory bandwidth exacerbates network latency effects.

As a new direct memory access technology, RDMA allows computers to directly access the memory of other computers without going through the processor. RDMA enables data to be quickly moved from one system to the remote system's memory without affecting the operating system in any way.

RDMA achieves network acceleration & diverse of Protocols

RDMA technology achieves a perfect combination of intelligent network cards and optimized software architecture, providing strong support for high-speed direct access to remote memory. By embedding the RDMA protocol in hardware (i.e., the network card) and using methods such as zero-copy and kernel bypass, high-performance remote data access is achieved. The following figure shows the working principle of RDMA, which gives users the following advantages in communication:

The working principle of RDMA

Zero copy:the application program is able to perform data transfer directly without involving the network software stack. Data can be directly sent to or received from the buffer without being copied to the network layer.

Kernel bypass: the application program can directly perform data transfer in user mode without the need for context switching between kernel and user mode.

No CPU involvement: The application program can access remote host memory without consuming any CPU on the remote host. Remote host memory can be read without involving any processes (or CPUs) on the remote host. The cache of the remote host's CPU will not be filled with the contents of the accessed memory.

Message-based transactions: Data is processed as discrete messages rather than as a stream, eliminating the need for the application program to slice the stream into different messages/transactions.

Scatter/gather entries support: RDMA natively supports scatter/gather. This means that multiple memory buffers can be read and sent as a stream, or a stream can be received and written to multiple memory buffers.

 

Currently, RDMA technology has been widely deployed in multiple high-performance scenarios such as supercomputing, AI training, and storage. However, there is a diversity of choices in the roadmap of RDMA technology, and different users and vendors have different preferences for RDMA technology. Among the mainstream RDMA technologies, they can be divided into two major camps: IB (InfiniBand) technology and RDMA-capable Ethernet technologies (such as RoCE and iWARP). The IBTA (InfiniBand Trade Association) mainly focuses on IB and RoCE technologies, while iWARP falls under the technology scope standardized by IEEE/IETF. In the storage field, RDMA-capable technologies have long existed, such as SRP (SCSI RDMA Protocol) and iSER (iSCSI Extensions for RDMA). The emerging NVMe over Fabrics, in essence, is NVMe over RDMA if not using FC networks. In other words, NVMe over InfiniBand, NVMe over RoCE, and NVMe over iWARP all fall within the scope of NVMe over RDMA.

Summary

With the continuous growth of data center and high-performance computing demands, RDMA technology, as a high-performance and low-latency data transfer technology, will continue to play an important role. Whether choosing IB technology or RDMA over Converged Ethernet technologies, users and vendors need to make choices based on their own needs and actual situations. IB technology has a wide range of applications and a mature ecosystem in the field of supercomputing, while RoCE and iWARP are more suitable for high-performance computing and storage scenarios in Ethernet environments.

As a leading provider of integrated optical network solutions, NADDOD is committed to providing customers with innovative, efficient, and reliable products, solutions, and services. NADDOD's high-performance switches, AOC/DAC/optical modules, and intelligent NICs can build a complete set of solutions based on IB and lossless Ethernet (RoCE) to meet customer needs in different application scenarios. NADDOD's solutions help users achieve business acceleration and performance improvement, providing innovative, efficient, and reliable solutions to help them succeed in high-performance computing, artificial intelligence, and storage fields. Please visit the NADDOD website for more information!