What Is InfiniBand Network and Its Architecture?
With the rapid growth of central processing unit (CPU) computing power, high-speed interconnect networks (HSI) have become a key factor in the development of high-performance computers. HSI is a new technology proposed to improve the performance of the Peripheral Component Interface (PCI) for computer peripheral components. After years of development, the main HSIs that support high-performance computing (HPC) are currently Gigabit Ethernet and InfiniBand, with InfiniBand being the fastest-growing HSI. In this article, we will delve deeper into the InfiniBand architecture and compare it to traditional IP networks.
What Is InfiniBand Architecture?
InfiniBand is a communication link for data flow between processors and I/O devices, supporting up to 64,000 addressable devices. The InfiniBand Architecture (IBA) is an industry-standard specification that defines a point-to-point switching input/output framework, typically used for interconnecting servers, communication infrastructure, storage devices, and embedded systems.
InfiniBand features universality, low latency, high bandwidth, and low management costs, making it an ideal connection network for single connection multiple data streams (clustering, communication, storage, management), with interconnected nodes reaching thousands. The smallest complete IBA unit is a subnet, and multiple subnets are connected by routers to form a large IBA network. An IBA subnet consists of end-nodes, switches, links, and subnet managers.
InfiniBand Network was applied to scenarios such as data centers, cloud computing, HPC, machine learning, and AI. The core visions are Maximum network utilization, Maximum CPU utilization, Maximum application performance.
InfiniBand Communication Channels
Traditionally, applications relied on the operating system to provide them with the communication services they needed. In contrast, InfiniBand enables applications to exchange data across a network, without directly involving the operating system. This application-centric approach, is the key differentiator between InfiniBand networks and traditional networks.
Main Components in Building InfiniBand Network
- InfiniBand Switches - Moves the traffic
- Subnet Manager - Manages all network activities
- Network Hosts - The clients for which the fabric is built
- Host Channel Adapters - Enable an InfiniBand connection between the Hosts and switches
- InfiniBand to Ethernet Gateway - Allows for IP traffic exchanges between InfiniBand and Ethernet based networks
- InfiniBand Router – Allows for inter connectivity between multiple InfiniBand subnets
InfiniBand Architecture vs TCP/IP
The InfiniBand architecture is divided into five layers, in a similar way to the traditional TCP/IP model, though there are many differences between InfiniBand and IP networks. In distributed storage networks, the protocols we use include RoCE, Infiniband (IB), and TCP/IP. RoCE and IB belong to the RDMA (Remote Direct Memory Access) technology. Faced with IO high concurrency and low latency applications such as high-performance computing and big data analysis, the existing TCP/IP software and hardware architecture cannot meet the application requirements. This is mainly reflected in the fact that traditional TCP/IP network communication sends messages through the kernel, and this communication method has high data movement and data replication costs. RDMA technology was developed to solve the delay in server-side data processing in network transmission. RDMA technology can directly access memory data through the network interface without the intervention of the operating system kernel. This allows for high throughput and low latency network communication, making it especially suitable for use in large-scale parallel computer clusters. As commonly used network protocols in distributed storage, IB is often used in the storage front-end network of DPC scenarios, while TCP/IP is often used in business networks.
InfiniBand Architecture Layers
InfiniBand Message Service
Applications are the “consumers” of the InfiniBand message service. The top layer of the InfiniBand architecture defines the methods that an application uses, to access the set of services provided by InfiniBand.
Upper Layer Protocols
The upper layer protocols present a standard interface, easily recognizable by the application. Some of the supported upper layer protocols are:
- MPI (Message Passing Interface) - a library interface for distributed/parallel computing
- NCCL - NVIDIA Collective Communication Library
- RDMA Storage Protocols
- IP over InfiniBand
TCP/IP vs. InfiniBand Transport Service
The InfiniBand messaging service is different than the one provided by the traditional TCP/IP which moves data from the operating system in one node, to the operating system in another node.
InfiniBand provides hardware-based transport services implemented by the network adapters, also known as HCAs or Host Channel Adapters.
Hardware-based Transport Layer
An end-to-end ‘virtual channel’ is created, connecting two applications that exist in entirely separate address spaces. Once an application has requested transport of a message, it is transmitted by the sending hardware.
When the message arrives to the receiving hardware, it is delivered directly into the receiving application’s buffer.
Until now we talked about the end nodes. But probably there is a bunch of networking devices connecting those nodes, such as routers and switches.
- InfiniBand routers are used to connect between different InfiniBand subnets.
- This allows network scaling, traffic isolation, and usage of common resources by multiple subnets.
- The network layer describes the protocol that allows routing of packets between different subnets.
- Routers use network layer addresses called global IDs or GIDs to route packets to the destination node.
LIDs – Local IDs
Each node in a subnet is assigned with an address called the Local ID or LID. LIDs are assigned and maintained by the Subnet Manager that manages the subnet.
The switches’ forwarding tables are populated with entries that map destination LIDs to exit ports.
Those forwarding tables are calculated by the subnet manager and programmed in the switches’ hardware.
Switching InfiniBand Packets
When a packet is generated by an end node, it includes a source LID and a destination LID. When the packet arrives to a switch its destination LID is matched against the switch entries and sent over the respective exit port.
Another key element of InfiniBand is the Link Layer flow control protocol.Flow control mechanisms are used to adjust the transmission rate between a sender and a receiver so that a fast sender does not overload a slow receiver.
The sending node dynamically tracks receive buffer usage and transmits data only if there is space for it in the receiving node buffer.This is what makes InfiniBand a lossless network. Packets are not dropped in the network during normal operation.Lossless flow control leads to very efficient use of bandwidth within the data center.
All those mechanisms sound great, but eventually data transfer occurs when bits are transmitted over a physical medium. The physical layer specifies how bits are placed on the wire and the signaling protocol to determine what constitutes a valid packet. In addition, the physical layer defines the characteristics and specifications for copper and optical cables. The following pics are examples of Mellanox LinkX InfiniBand DAC and AOC cables.
What Is InfiniBand and How Is It Different from Ethernet?
Why Is InfiniBand Used in HPC?
InfiniBand: Unlocking the Power of HPC Networking
InfiniBand NDR: The Future of High-Speed Data Transfer in HPC and AI
Best Solutions for Your InfiniBand Network