Liquid Cooling Technology in Data Centers

NADDOD Jason Data Center Architect May 28, 2024

The data center industry is undergoing a significant transformation, driven by the urgent need for efficient and sustainable cooling solutions. The global data center liquid cooling market, valued at USD 4.48 billion in 2024, is projected to reach USD 12.76 billion by 2029, growing at a compound annual growth rate (CAGR) of 23.31%​. This growth is largely due to the increasing reliance on digital infrastructure, IoT proliferation, and cloud computing expansion.

 

data center liquid cooling market share

 

As data centers handle more data, traditional air-cooling methods are becoming less viable. Liquid cooling technologies offer superior heat dissipation and energy efficiency, emerging as a critical solution. HPC, AI, and edge computing demand robust cooling systems to maintain optimal performance.

 

Under the pressure of efficient energy management and the pursuit of carbon neutrality, liquid cooling has become essential for sustainable data centers. This article explores various types of coolants, their physical properties, and technical requirements, aiming to promote the application of liquid cooling technology in information and communication technology and provide practical guidance.

 

1. Data Center Thermal Characteristics

In data centers, the thermal density varies significantly across different spatial dimensions. Traditional air-conditioning systems can only regulate the overall or local environmental temperature of the server room. However, within the server cabinets, there are substantial heat generation gradients among different components. For instance, the heat dissipation of CPU chips is much higher than that of other components like memory and power supply units (PSUs), which together account for only 20-30% of the server's total power consumption. This discrepancy leads to over-cooling or overheating of different components when using conventional air cooling, which is incapable of precise cooling at the component level. To mitigate overheating, data centers often increase the cooling capacity or lower the air temperature, resulting in excessive energy consumption.

 

As the computational power and packaging technology of CPU chips continue to advance, their heat dissipation increases yearly. Currently, high-performance CPU chips exhibit a surface heat flux density of 30-50 W/cm². As the chip structure size continues to shrink, this density is expected to rise, reaching an estimated 100-150 W/cm² within the next five years. The performance projections for high-performance packaged CPU chips are as follows:

 

Item

2017

2018

2020

2022

2024

2026

2028

2030

CPU Cores

28

32

42

50

58

66

70

70

Gate Length (nm)

10

10

7

5

3

2.5

2.1

1.5

Frequency (GHz)

2.5

2.75

3.10

3.30

3.50

3.70

3.90

4.10

Heat Dissipation (Socket) / W

205

215

237

262

288

318

351

387

 

Analyzing data centers from horizontal and spatial dimensions, the power density of data centers is approximately an order of magnitude lower than that of cabinet power density, which in turn is about an order of magnitude lower than the power density of CPU chips. This means that the primary heat sources in data centers are concentrated in the CPU chips.

 

2. Liquid Cooling Technology and Its Applications

What is Liquid Cooling Technology

Liquid cooling is an advanced thermal management technique used to dissipate heat from electronic components, particularly in high-density data centers. Unlike traditional air cooling, which uses fans and heat sinks to transfer heat away from components, liquid cooling employs a coolant to absorb and transport heat more efficiently. This method is especially beneficial for high-performance computing environments, where the thermal demands exceed the capabilities of air cooling.

 

Types of Liquid Cooling Technology

Understanding the thermal characteristics of data center components reveals the superior efficiency and suitability of liquid cooling for high-power density scenarios. For CPU thermal design power (TDP) up to 50W, natural cooling is sufficient. Air cooling is effective for TDPs between 50W and 100W, precision air conditioning works for TDPs between 100W and 200W, while liquid cooling is essential for TDPs above 200W.

 

Liquid cooling technologies are categorized into direct and indirect cooling, based on whether the coolant directly contacts the heat-generating components.

 

  • Direct Liquid Cooling

Direct liquid cooling involves the coolant directly contacting the heat-generating components to transfer heat. This method is further divided into:

 

Single-phase cooling: The coolant remains in a single physical state (liquid) throughout the cooling process.

 

Phase-change cooling: The coolant changes its physical state (from liquid to gas or vice versa) during the heat exchange process, enhancing heat transfer efficiency.

 

  • Indirect Liquid Cooling

Indirect liquid cooling transfers heat without direct contact between the coolant and the heat-generating components, primarily through thermal conduction via a heat exchanger. This method requires coolants with high thermal conductivity and stability.

 

3. Advantages of Liquid Cooling Technology

High Efficiency

Liquid cooling, both direct and indirect, brings the coolant closer to the heat source, allowing for precise heat transfer and minimizing thermal losses. Unlike traditional water-cooled systems, liquid cooling can operate at higher supply-return temperature differentials, which facilitates natural cooling without compressors in some regions. This increased efficiency can reduce the Power Usage Effectiveness (PUE) of data centers to as low as 1.05, significantly cutting energy consumption and operational costs. For example, data centers utilizing direct liquid cooling can maintain optimal performance even in high-density computing environments without the need for extensive air conditioning infrastructure.

 

High Reliability

According to the US Air Force Avionics Integrity Program, temperature, vibration, humidity, and dust are major factors contributing to electronic equipment failures. Temperature-induced failures account for 55%, dust for 6%, humidity for 19%, and vibration for 20%. Liquid cooling, especially direct liquid cooling, immerses heat-generating components in non-conductive coolant, isolating them from airborne dust and vibrations. This immersion significantly reduces the risk of failures caused by environmental factors, enhancing the overall reliability of the system. For instance, servers in a liquid-cooled data center experience fewer failures due to overheating and dust accumulation compared to those in air-cooled environments.

 

Noise Reduction

Liquid cooling systems dramatically reduce noise levels by eliminating or minimizing the need for high-speed fans. In direct liquid cooling setups, fans are often removed entirely, resulting in silent operation without airflow or vibration noise. In indirect liquid cooling systems, low-speed fans are used, significantly lowering the noise generated by air movement and vibrations. This reduction in noise creates a quieter and more conducive environment for data center operations. For example, employees working in liquid-cooled data centers report a more comfortable and less disruptive work environment due to the reduced noise levels.

 

Space Efficiency

Liquid cooling is particularly well-suited for high-computation scenarios such as AI and HPC. It allows for greater computational power deployment within the same physical space, reducing the number of physical devices required. Liquid cooling systems can also operate without compressors, relying on natural cooling sources, which reduces the footprint of air conditioning systems and eliminates the need for large mechanical rooms. This space efficiency is crucial for data centers looking to maximize their computational capacity while minimizing their physical footprint. For example, a data center implementing liquid cooling technology can house more servers and provide higher processing power in a smaller area, optimizing the use of available space.

 

4. Applications of Liquid Cooling Technology

Cold Plate Liquid Cooling

Cold plate liquid cooling, a form of indirect cooling, modifies servers to handle high-power density components while using air cooling for lower-power components. This method improves efficiency and energy savings by tailoring temperature control based on power density. Cold plate systems can be further divided into warm water and heat pipe types, with various design approaches based on the cooling requirements and power densities of different servers.

 

Warm Water Cold Plate Liquid Cooling: Warm water cold plate systems connect heat-generating components via hard or soft tubing, facilitating efficient heat transfer and natural cooling throughout the year. This design lowers data center energy consumption and achieves low PUE operations by utilizing warm water instead of chilled water, reducing the need for energy-intensive chillers.

 

Heat Pipe Cold Plate Liquid Cooling: Heat pipe cold plate systems transfer heat from the components to a water loop using heat pipes, which prevent water ingress and potential PCB short-circuits. This method provides a robust and efficient cooling solution for high-performance servers, ensuring high reliability and efficient thermal management.

 

Immersion Cooling

Immersion cooling is a direct cooling method where all heat-generating components are submerged in coolant. This technique is divided into single-phase and phase-change immersion cooling:

 

Single-Phase Immersion Cooling: In single-phase immersion cooling, the liquid remains in the same phase (liquid state) throughout the cooling process. The liquid cooling cabinet is connected to a cooling distribution unit, which circulates the coolant to maintain optimal temperatures.

 

Phase-Change Immersion Cooling: Phase-change immersion cooling involves the coolant changing from liquid to gas and back to liquid, using condensers within the cabinet to facilitate this process. This method can efficiently manage higher heat loads and is particularly useful in high-density data center environments.

 

Special handling is required for certain components like optical modules and mechanical hard drives to ensure reliable operation in immersion cooling setups.

 

Spray Cooling

Spray cooling is another direct cooling method that uses a top-down spray design to cover all heat-generating components with coolant. This method allows for precise cooling adjustments based on power density and maintains the existing server deployment shape without requiring significant modifications. Spray cooling can provide highly efficient cooling for high-power density components, making it suitable for advanced data center applications.

 

Atomized Jet Cooling

Currently in the research phase, atomized jet cooling is a highly efficient CPU cooling technology. It uses high-pressure gas or liquid to atomize coolant and force it onto the heat-generating surface. This method offers high heat flux and uniform cooling, making it ideal for applications requiring strict temperature control, such as microelectronics, laser technology, and aerospace. Atomized jet cooling has the potential to revolutionize cooling efficiency and reliability in high-performance computing environments.

 

naddod

 

5. NADDODs Liquid Cooling Modules

NADDOD is a professional provider of innovative optical networking solutions for HPC, AI, data centers, enterprise, and telecom customers, and a leading global transceiver manufacturer. By integrating NADDOD's liquid cooling modules, data centers can achieve exceptional heat dissipation performance, ensuring high performance with low latency even under maximum loads. These modules are compatible with mainstream server models, making them an invaluable asset for modern high-performance computing environments.

 

800g Broadcom VCSEL&DSPInfiniBand Compatible OSFP 400G SR4H MMA4Z00-NS400 Broadcom VCSEL&DSP - Bulk Purchase Request