Advantages and Working Principle of RoCE v2 in RDMA Protocol

NADDOD Adam Connectivity Solutions Consultant Feb 26, 2024

RoCE (RDMA over Converged Ethernet) is an Ethernet-based remote direct memory access protocol designed to achieve high-performance, low-latency data transfer over Ethernet networks. The early versions of RoCE (RoCE v1) had certain limitations, but with the advancement of technology, RoCE v2 emerged to address those shortcomings.

 

Remote Direct Memory Access (RDMA) is a data transfer mechanism that allows data to be transferred between the memory of one computer to the memory of another computer without involving the host CPU. This approach reduces the overhead of the traditional TCP/IP protocol stack and improves data transfer efficiency. As an RDMA-based protocol, RoCE v2 inherits the advantages of RDMA and optimizes it for Ethernet.

 

RoCE v2

Basic features of RoCE v2

 

  • IPv4 and IPv6 Support: RoCE v2 not only supports IPv4 but also incorporates IPv6 support in its protocol design, making it more suitable for the future trends in networking.

 

  • Multi-Queue Support: RoCE v2 introduces multi-queue support, enabling the network to handle concurrent requests more effectively, thereby improving network throughput and concurrency performance.

 

  • Hardware Independence: RoCE v2 exhibits hardware independence, meaning it can be implemented on Ethernet adapters and switches from different vendors, adapting well to diverse hardware environments.

 

  • Network Layer Optimization: RoCE v2 optimizes the network layer, enhancing network stability and performance. It excels in high-load, low-latency application scenarios, such as data centers and high-performance computing.

 

  • Hardware Acceleration:Leveraging hardware acceleration techniques, RoCE v2 further improves data transfer efficiency. Hardware acceleration can be implemented on adapters and switches, reducing the burden on the host CPU and lowering transmission latency.

 

Working Principle of RoCE v2

 

Remote Direct Memory Access (RDMA) allows the direct reading and writing of memory in one computer system from another computer system, without involving the host CPU. RDMA reduces the complexity and latency of data transfer by bypassing the traditional TCP/IP protocol stack.

 

RoCE v2 typically resides at the transport layer of the protocol stack and is built directly on top of the Internet Protocol (IP). Specifically, RoCE v2 utilizes the User Datagram Protocol (UDP) as the transport layer protocol to encapsulate the RDMA protocol, enabling high-performance data transfer over Ethernet networks. The placement of RoCE v2 in the protocol stack allows it to leverage RDMA technology for direct memory access, thereby achieving efficient data transfer.

 

The core idea of RDMA is to bypass the host CPU, allowing the remote system to directly read and write local memory, thereby reducing the latency of data transfer and the processing overhead on the host. The key aspects of implementing RDMA over Ethernet with RoCE v2 include:

 

RoCE v2 requires the use of adapters (network interface cards) and switches that support RDMA functionality. These hardware components possess the capability to handle RDMA requests, enabling efficient memory access within the network.

 

RoCE v2 Data Transfer Process

 

  • Connection Establishment: RoCE v2 utilizes a control path for establishing connections. The two communicating endpoints exchange control information to establish an RDMA connection, including configuration details of the adapters and switches.

 

  • Data Transfer: Once the connection is established, the data transfer phase begins. On the data path, RoCE v2 leverages RDMA capabilities directly, bypassing the host CPU, to copy data from the source memory to the destination memory, enabling zero-copy data transfer.

 

  • Connection Termination: After the data transfer is completed, a disconnect request is sent through the control path to terminate the RDMA connection.

 

RoCE v2 switch

The control path is primarily responsible for establishing and managing RDMA connections, including connection initialization, handshake process, and error handling. The control path uses the UDP protocol and carries RDMA-related information in the UDP header to perform these tasks. This process typically involves the collaboration of adapters, switches, and operating systems.

 

The data path is used for actual data transfer and is crucial for achieving high performance in RoCE v2. In the data path, RoCE v2 utilizes adapter hardware offloading techniques to transfer data directly from the source memory to the destination memory, without the need for intermediate buffering. This reduces the multi-layer copying and processing overheads in the traditional TCP/IP protocol stack. Such hardware offloading mechanism significantly lowers transmission latency and improves data transfer efficiency.

 

Comparison between RoCE v2 and Traditional Networks

 

  1. Performance Advantage

 

RoCE v2 provides lower latency and higher throughput compared to traditional networks, making it an ideal choice for performance-sensitive applications such as high-performance computing and big data analytics.

 

  1. RDMA Support

 

RoCE v2 is a protocol that supports RDMA, enabling direct data transfer between memories without involving the host CPU. This reduces the processing overhead of data transfer and improves efficiency.

 

  1. Protocol Stack

 

RoCE v2 uses the UDP/IP protocol stack, while traditional networks use the TCP/IP protocol stack. Since RoCE v2 operates directly on Ethernet, it avoids some of the overhead associated with the TCP/IP protocol, resulting in improved performance.

 

  1. Applicability

 

RoCE v2 is primarily used in data center environments, particularly for applications that require low latency and high throughput. Traditional networks are more widely used in enterprise, internet, and general communication settings.

 

  1. Use Cases

 

RoCE v2 is commonly employed for communication between high-performance computing, storage, and networking devices within large-scale data centers, while traditional networks are more suitable for general enterprise networks and the internet.

 

  1. Configuration and Management

 

Deploying RoCE v2 may require specialized network hardware and configurations, while traditional networks are typically easier to configure and manage due to their use of widely adopted standard protocols.

 

  1. Compatibility

 

Traditional networks have broader compatibility as they utilize the standard TCP/IP protocol stack, while RoCE v2 requires specific hardware and protocol support.

 

When applying RoCE v2 in data center networks, it is essential to address the requirements for lossless network transmission and also focus on fine-grained operations and maintenance to meet the demands of latency-sensitive and packet loss-sensitive network environments. Additionally, there are some deployment challenges in RDMA networks, such as PFC storms, deadlock issues, and complex ECN threshold design in multi-tier networks. NADDOD's experts have conducted research and accumulated knowledge on these issues and look forward to discussing them further with the community.