How to keep datacentres cool | JP
Power, and efficient cooling, are essential to datacentre operations. This applies equally to a cloud hyperscaler, a commercial colocation facility, or an enterprise’s own datacentre. Without sufficient power, and without the ability to remove excess heat, servers cannot operate.
The need for power and low-cost space, along with better connectivity and automation, has allowed datacentre operators to move away from urban areas. In Europe, this meant a move away from business districts. In the US, operators have opted for states such as Arizona, Nevada and Texas where land is cheap.
But developments in computing technology, and the demand from services such as artificial intelligence (AI), are changing the mechanics and the economics of datacentre development.
On both sides of the Atlantic, pressure on power grids and water supplies is limiting development. And demand is expected to continue to grow sharply, as operators look to pack more equipment into their sites, computer designers pack more processing into denser server packages, and more applications demand power-hungry graphics processing units (GPUs) and other specialist processors.
Artificial intelligence’s rapid growth is adding another set of pressures. In 2023, researchers at the University of California, Riverside, calculated that ChatGPT’s large language model (LLM) uses 500ml of water to answer five to 50 prompts. Almost all that water goes on cooling. And, according to Alvin Nguyen, a senior analyst at Forrester, when an LLM creates an image, it uses roughly as much energy as it takes an internal combustion engine car to drive one mile.
These developments are putting pressure on the often-conservative datacentre industry, as well as prompting CIOs to look at alternative technologies.
An enterprise or datacentre operator can do little to bolster power grids. And although firms can move to less power-hungry chips, the trend is still for datacentre power usage to rise.
According to Tony Lock, distinguished analyst at Freeform Dynamics, this is inevitable as enterprises digitise their processes. Work that was done manually is moving to computers, and computers are moving from the office or the data room to a datacentre or the cloud.
“The datacentre is responsible for more and more business service delivery, but the poor old datacentre manager gets the blame for the electricity increases,” he says.
Updating cooling, though, could offer a quick win both for finance and performance.
If operators can improve cooling efficiency, they can pack more equipment into a datacentre. Better cooling is essential to run the GPUs for AI. Nvidia’s Blackwell platform requires liquid cooling, even as it promises to operate at up to 25% less cost and energy consumption than its predecessors.
By updating cooling technology, firms also have a chance to cut their power bills. The industry consensus is that some 40% of datacentre power is used on cooling.
Cooling conventions
The conventional way to cool systems in datacentres is by air circulation. Racks and servers are fitted with fans, and datacentres install computer room air-conditioning (CRAC) units or computer room air handler (CRAH) units to keep the air at the correct temperature. These coolers usually vent waste heat to the outside air. If they use evaporative cooling, this needs both electricity and water.
The size and capacity of the CRAC and CRAH units will also determine the physical design of the datacentre, and even its overall size. As David Watkins, solutions director at datacentre operator Virtus Data Centres, points out, each unit is designed to cool a certain number of kilowatts (kW) of capacity, and will have a maximum “throw”, or how far the cool air will reach. With all that taken into account, designers can decide on building dimensions and where to position racks.
Datacentre engineers can also make air cooling more efficient by installing hot and cold aisles. This improves the air circulation by separating incoming cool air and warm air exhaust. It also helps control the ambient temperature in the rest of the datacentre, making it more comfortable for human operators.
But air cooling remains a noisy and expensive process, and one that has its limits – above a certain level of computing power, air is no longer able to provide sufficient cooling. “About 40kW is the upper limit of where you get to with air,” says Watkins.
This means datacentre operators need to look at alternatives.
On tap: liquid cooling
Air cooling has improved gradually over the past couple of decades, and is certainly more efficient than it was. “Air cooling is well established and proven, and has seen incremental improvements in performance,” says Steve Wallage, managing director of datacentre specialist Danseb Consulting.
There are innovations in air cooling, Wallage points out. KyotoCooling, for example, which uses a “thermal wheel” to control hot and cold air flows across a datacentre, can save 75% to 80% over conventional cooling. “Their key reason for remaining niche solutions is their lack of an installed base,” says Wallage.
Instead, liquid cooling has emerged as the main alternative to air cooling, especially for high-performance computing (HPC) and AI installations.
In part, this is because high-performance systems are being shipped with liquid cooling built-in. But its disadvantages, including its complexity and logistical footprint, are offset by its inherent efficiency. Liquid cooling is more efficient than air cooling, and can even use less water than air cooling’s CRAH systems.
Liquid cooling comes in several forms, including direct-to-the-chip cooling, immersion cooling, where the entire device is kept in a non-conductive liquid, and a range of systems that cool the racks. Often, the liquid is not water but a specialist oil.
Immersive systems have to be built in close collaboration with the server or GPU manufacturer so they operate without damage to components. They are popular for cryptocurrency mining and other specialist systems.
Direct-to-chip cooling, again, needs integration by the server manufacturer. As a result, systems with liquid cooling are often shipped with the cooling already configured. The datacentre operator just needs to connect it all up to the main systems, including heat exchangers or cooling distribution units.
“There are existing technologies where you can leverage direct liquid cooling,” says Forrester’s Nguyen. “But it requires extra space, and you can’t really go too dense because you need pipes to all the [heat]-producing chips or assets inside the server.
“And a lot of people don’t like alternatives such as liquid immersion, because you’re working with something that might reduce equipment lifespans, and operationally it makes things a lot more complicated.” Reasons include the need to switch off systems, allow the liquid to cool, and drain it down before carrying out maintenance or upgrades.
IT teams can also opt for simpler rear door, side car or in-row units for liquid cooling.
Alistair Barnes, Colt Data Centre Services
Rear door cooling, or air-to-liquid heat exchangers, are popular as they can be retrofitted to existing racks. There is no direct contact with the chip, so the engineering is less complicated, and the risks are lower. This comes at the cost of reduced cooling performance, however. Rear door cooling systems are completely passive, and will typically cool systems in the 20kW to 120kW range, with some manufacturers claiming higher ratings.
A further advantage of rear door or side car cooling is that they are easier to integrate with conventional air cooling. For the foreseeable future, most datacentres will run air-cooled systems, such as storage and networking, alongside high-performance, liquid-cooled hardware.
“Liquid cooling is an innovative solution, but the technology is not yet in a position to completely replace air cooling in datacentres,” cautions Alistair Barnes, head of mechanical engineering at Colt Data Centre Services.
“And even if equipment is cooled by liquid, heat will be transferred to it and some of this will be dissipated into a room or surrounding space where air will be required to remove this. We recommend a hybrid where liquid and air techniques are used together.”
This allows datacentre operators to maximise both operational and power efficiency, or power usage effectiveness (PUE).
Blue sky, and blue pool, thinking
There are further limits on liquid cooling, and these, along with the growing demand for compute from AI, are prompting datacentre operators to look at even more innovative solutions.
Weight is an issue with liquid cooling because the racks are heavier and can exceed the datacentre’s structural design. “There will be sites out there that you can retrofit to a degree, but these factors may make it difficult if you’ve a slab [concrete base] of a certain strength,” says Virtus Data Centres’ Watkins.
Some datacentres are using free air cooling, which works well in colder climates such as northern Europe, Scandinavia or the US Pacific Northwest. “Free air cooling is still viable, though some things weren’t considered to begin with,” says Freeform’s Lock. “I think humidity was considered, dust wasn’t.”
Some datacentres are now moving to salt mines for their low humidity. Others are connecting to municipal heating grids so waste heat can be used to heat nearby buildings. This is not new – a number of Scandinavian datacentres have run this way since the 1970s. But European regulations increasingly control how datacentres exhaust excess heat, stipulating that it cannot simply be pumped into the atmosphere.
More radical designs include building large water tanks under datacentres to store waste heat for future use. Equinix’s AM3 datacentre in Amsterdam uses cool water from underground aquifers, as well as free air cooling. Other datacentres use waste heat to heat swimming pools.
Not everyone can relocate to a salt mine, or install swimming pools, but CIOs can plan now to invest in improved datacentre cooling. And they can ask whether their cloud and colocation providers are using cheaper, and cleaner, cooling technology.