RDMA over Converged Ethernet VS RDMA

NADDOD Gavin InfiniBand Network Engineer Jul 12, 2023

RDMA technology has attracted attention for its excellent performance and low latency. RoCE technology combines RDMA with Ethernet to achieve high-performance communication on existing infrastructure. This blog will explore the advantages and application areas of RDMA and RoCE technology, helping to build superior high-performance networks.

Wide application of RoCE

Since the emergence of Ethernet, its open and diverse ecosystem, high-speed growth rate, and significant cost advantages have made it widely used in the industry. Among the various technology paths of RDMA, RoCE technology has the most extensive application. In the world-renowned high-performance computing ranking Top500, Ethernet interconnect technology accounts for more than half.

RoCE technology combines RDMA with Ethernet, making it possible to achieve low-latency and high-bandwidth data transmission on Ethernet through the use of special network adapters and switches. The emergence of RoCE technology not only makes RDMA technology easier to deploy and use, but also fully utilizes existing Ethernet infrastructure, providing users with high-performance network communication solutions.

Analyzing the Limitations of RoCE

Due to the performance bottlenecks of traditional Ethernet networks, general RoCE applications still suffer from performance losses such as congestion, packet loss, and latency jitter in high-performance businesses, making it difficult to meet the demands of high-performance computing and storage.

In high-performance storage clusters, FC networks have connection-oriented technologies that are insensitive to network upgrades and process failures. At the same time, the long frame header of the FC protocol provides transmission function guarantees with low protocol overhead, no packet loss, sequentially transmitted data frames, and a reliable, low-latency network. Compared to FC, traditional Ethernet is prone to congestion and packet loss, with the resending of lost packets easily leading to data disorder. The Ethernet network also experiences large amounts of jitter, and its store-and-forward mode leads to complex lookup processes and high forwarding latency. In multi-cast scenarios, queues can become congested, with queue latency that cannot be ignored.

In HPC applications, traditional Ethernet has weaker message encapsulation capabilities and complex lookup processes that lead to high forwarding latency. Transmission losses in the network can cause processors to idle while waiting for data, thereby dragging down overall parallel computing performance. According to test results from the ODCC in 2017, traditional Ethernet compared to specialized networks showed a performance difference of up to 30% in supercomputing cluster applications.

Exploring the birth and development of RDMA

With the improvement in storage and computing performance, the access latency between the two in data centers has been optimized from 10ms to the level of 20us, which represents an improvement of nearly a thousand times. However, if the network transmission mechanism based on the TCP protocol is still used at this point, the network latency will still be maintained at the level of milliseconds due to the packet loss and retransmission mechanism of TCP, which cannot meet the requirements of highperformance computing and storage for latency. At this point, the emergence of RDMA technology provides a new technological approach for improving network performance.

RDMA is a concept that uses DMA when two or more computers communicate with each other, allowing direct access from the memory of one host to the memory of another. In the packet processing of traditional TCP/IP technology, it needs to go through the operating system and other software layers, which requires a large amount of server resources and memory bus bandwidth. Data is copied and moved back and forth between system memory, processor cache, and network controller cache, which imposes a heavy burden on the server's CPU and memory. In particular, the severe "mismatch" between network bandwidth, processor speed, and memory bandwidth exacerbates network latency effects.

As a new direct memory access technology, RDMA allows computers to directly access the memory of other computers without going through the processor. RDMA enables data to be quickly moved from one system to the remote system's memory without affecting the operating system in any way.

RDMA achieves network acceleration & diverse of Protocols

RDMA technology achieves a perfect combination of intelligent network cards and optimized software architecture, providing strong support for high-speed direct access to remote memory. By embedding the RDMA protocol in hardware (i.e., the network card) and using methods such as zero-copy and kernel bypass, high-performance remote data access is achieved. The following figure shows the working principle of RDMA, which gives users the following advantages in communication: