What is RDMA and its application?

NADDOD Dylan InfiniBand Solutions Architect Jul 7, 2023

RDMA (Remote Direct Memory Access) technology originated from the InfiniBand network and was primarily used in high-performance scientific computing. With the rise of cloud computing, RDMA technology has gradually been applied to scenarios in certain cloud data centers that require high performance.

1. The Technical Principles of RDMA

Image:RDMA-Data remote transfer
Image:RDMA-Data remote transfer

RDMA stands for Remote Direct Memory Access. Traditional DMA, or Direct Memory Access, allows external devices such as network cards to directly access memory without CPU involvement, thus reducing the burden on the CPU. In contrast, RDMA directly transfers data from the memory of one node to another node through high-performance networks, with the entire process being handled by network hardware without CPU intervention. This can greatly improve performance in scenarios with high throughput and massive data exchange.
Image:RDMA-End-to-End Flow Control
Image:RDMA-End-to-End Flow Control

To implement remote data transfer, optimization adjustments are needed throughout the entire end-to-end link. Firstly, a dedicated RDMA protocol stack is required to replace the traditional TCP protocol stack. Secondly, specialized RDMA network cards are necessary on the host side to offload the CPU. Finally, some degree of traffic and congestion control is required on the network side to ensure predictable performance of remote RDMA.

1.1 On the Host-Side

On the host side, there are two key technologies involved:

(1) Kernel Bypass

(2) NIC offloading

Usually, these two technologies can exist independently. For example, DPDK is a typical kernel bypass technology that directly maps network packets received by the network card to user-space memory, reducing the need for a memory copy from kernel to user-space and eliminating a series of packet processing procedures in the kernel. On the other hand, NIC offloading is a hardware function that can be implemented by some network cards, especially in virtualized environments, where many NICs can offload VxLAN encapsulation and decapsulation.

For RDMA, both of the above technologies are utilized. Specifically, RDMA adopts a dedicated protocol stack that directly maps user-space memory data to the network card, bypassing the traditional kernel packet processing procedure. In addition, for NIC offloading, RDMA NIC offloading involves offloading the entire transport layer logic, avoiding the burden of host CPU processing flow control. This can significantly reduce host CPU usage and reduce transmission latency.

Image:RDMA-Big Data Training Process Diagram
Image:RDMA-Big Data Training Process Diagram

1.2 On the Network Side

On the network side, while SDN addresses routing control issues, RDMA aims to solve traffic and congestion control problems. What kind of traffic needs to be addressed? When it comes to point-to-point traffic, switches with guaranteed wire speed can handle high traffic volumes with ease. However, when there is a mismatch between high-speed and low-speed ports or multiple-to-one traffic, even the most powerful switch chips cannot handle the traffic, leading to queue congestion. In such cases, the solution can only come from the source end, where the large traffic is divided into smaller flows to match the capacity of the output port. The most commonly used method for flow control at the source end is TCP-based side flow control, which has the disadvantage of being relatively slow and may lead to packet loss in the middle of the network when feedback is received from the destination end. RDMA solves this problem by providing end-to-end flow control, allowing the entire network to participate in feedback adjustment for traffic congestion.

2. The Technical Features of RDMA

From the above analysis, it can be seen that RDMA has made multiple optimizations for end-to-end network transmission. Its positioning is a high-performance networking technology, and its effects are mainly reflected in the following aspects:

● Reducing CPU overhead: Through kernel bypass zero-copy on the host side and offloading of transport control protocols on the network, RDMA can greatly offload the CPU on the host, thus indirectly improving computing efficiency.

● Fast congestion handling: In addition to the source end, the network also directly participates in congestion handling, allowing for the immediate detection of congestion and timely feedback to avoid large-scale packet retransmission.

● Low Latency: Low latency is one of the most significant features of RDMA. Through streamlined processing on the host side and timely feedback on congestion from the network, RDMA can effectively ensure predictable latency, improving communication efficiency.

In addition, since data center networks are typically built using Ethernet switches, the high throughput characteristics of RDMA can still be maintained.

3. The Widespread Applications of RDMA

RDMA technology has been widely adopted in various domains, including high-performance scientific computing, cloud computing, finance, telecommunications, and manufacturing. Its advantages in significantly improving data transfer speed and efficiency make large-scale data processing and analysis more efficient. With its low latency and high throughput characteristics, RDMA meets the requirements for real-time data processing and fast response. In addition, it also has reliability and scalability, allowing it to address the challenges of large-scale data centers and complex network environments.

Therefore, as a powerful tool for high-performance data transfer, RDMA technology has many advantages, such as low latency, high throughput, reliability, and scalability. Whether you are an expert in high-performance computing or an enterprise that needs fast data transfer, it is worth trying RDMA technology as a connection solution. It is worth mentioning here the leading optical network solutions provider, Naddod. As a company that has been deeply involved in the field of optical network connection technology for many years, Naddod provides customers with innovative, efficient, and reliable optical network products and solutions. Whether your industry is high-performance computing, data centers, finance, telecommunications, or manufacturing, Naddod can provide you with hardware solutions such as high-performance switches, AOC/DAC/optical modules, and intelligent network cards. Our products are known for their low cost and excellent performance, and can significantly improve your business acceleration capabilities. Therefore, Naddod’s products and solutions are widely used in various industries, whether it is processing large-scale scientific computing, real-time data analysis, or meeting the low-latency requirements of financial transactions, we can provide you with the best solutions. Please visit Naddod’s official website www.naddod.com for more information and let us work together for success!