Your GPU Cluster Needs a Swimming Pool: The Water Crisis Nobody's Talking About


I got an email last week from a recruiter. Senior role at a company building 'AI-first infrastructure.' The job description was full of the usual buzzwords. Scale. Innovation. Future-ready architecture.

But buried in the third paragraph was something interesting: 'Experience with water-cooled data center operations preferred.'

Not power management. Not cooling systems. Water. Specifically.

That's when I knew we'd moved past the hype phase into the 'uh oh, physics still exists' phase.

The GPUs You Can't Plug In

In my last post, I wrote about how companies are hoarding GPUs they can't use. The utilization numbers are brutal. 85% idle capacity. $120 million sitting unused. Microsoft admitting they have chips in inventory they literally cannot plug in.

I thought the story ended there. I was wrong.

The problem isn't just that companies bought too many GPUs. It's that even if they'd bought exactly the right number, they still couldn't run them. Because modern AI infrastructure has two problems that nobody talks about until after the contracts are signed:

You can't get enough power.

And you can't get enough water.

The Power Problem (Or: Why Microsoft's Chips Are Still In Boxes)

Let's start with why Microsoft's GPUs are sitting in inventory. Satya Nadella didn't say 'we bought too many.' He said: "The biggest issue we are now having is not a compute glut, but it's power. I may actually have a bunch of chips sitting in inventory that I can't plug in because I don't have warm shells to plug into." Translation: We have the GPUs. We have the buildings. We don't have enough electricity to turn them on.

Here's why that's happening. A conventional data center draws as much electricity as 10,000 to 25,000 households. That's a lot. But it's manageable. Utilities can handle that. AI factories are different. Hyperscale AI data centers can use as much power as 100,000 homes or more. That's not an incremental increase. That's a fundamental shift in what infrastructure means.

Let me put that in context. Meta's Hyperion data center in Louisiana is expected to draw more than twice the power of the entire city of New Orleans. Just one building. More than New Orleans. Another Meta data center planned in Wyoming will use more electricity than every home in the state combined. Every. Home. In. Wyoming.

And it gets worse. U.S. data centers now make up 4.4% of electricity consumption nationwide, up from 1.9% in 2018. By 2028, that number could climb to 12%. One out of every eight kilowatt-hours in America could be going to data centers. Most of it for AI. The grid can't handle it. Not without massive upgrades. Not without years of planning. Not without billions in infrastructure investment that nobody budgeted for because everyone was too busy buying GPUs.

The Heat Problem (Or: Why Everything Needs Water Now)

But even if you solve the power problem, you've still got another one: heat. GPUs run hot. Really hot. And when you pack hundreds or thousands of them into a single rack, they generate an obscene amount of thermal energy that has to go somewhere.

Traditional data centers used air cooling. Computer Room Air Conditioning (CRAC) units. Big fans. Cold air in, hot air out. Simple. It doesn't work anymore. Cooling accounts for 30% to 40% of total data center energy use, and that's just to maintain stable operating temperatures and prevent equipment failure. The new stuff? Upcoming racks like the Rubin Ultra NVL576 expected in 2027 could consume up to 600 kW per rack. You can't cool 600 kilowatts per rack with air. It's physically impossible. The air can't move fast enough. It can't absorb enough heat.

You need liquid cooling. And that means water. Why water? Because water absorbs heat 3,000 times more effectively than air. Three thousand times. There's no alternative that works at this scale. You can use exotic cooling liquids, but they have the potential to introduce forever chemicals into the mix, which makes big tech companies skittish about investing too heavily in them. So it's water. Massive amounts of water. Every single day.

The Numbers (Or: How Much Water Does AI Actually Use?)

Here's where it gets really uncomfortable. A medium-sized data center can consume up to 110 million gallons of water per year for cooling purposes, which is equivalent to the annual water usage of approximately 1,000 households. Larger data centers can each drink up to 5 million gallons per day, or about 1.8 billion gallons annually, usage equivalent to a town of 10,000 to 50,000 people. Not a data center campus. One building.

Nationally, U.S. data centers consume 449 million gallons of water per day and 163.7 billion gallons annually. As of 2021. Before the AI boom really took off. Want to know what that looks like at the individual level? Each 100-word AI prompt is estimated to use roughly one bottle of water, or 519 milliliters. Every time you ask ChatGPT to write an email, you're consuming about a bottle of water. Not directly. But somewhere, in a data center you'll never see, water is evaporating to keep the GPUs cool enough to generate your response. Multiply that by billions of prompts per day and you start to understand the scale.

The Texas Problem (Or: What Happens When You Build Without Planning)

Let me show you what this looks like in practice. Texas is currently building over 400 data centers. Many of them are AI-focused hyperscale facilities. And Texas has a water problem. Data centers in Texas will use 49 billion gallons of water in 2025, and as much as 399 billion gallons in 2030. Let me put that in perspective. That would be equivalent to drawing down the largest reservoir in the US—157,000-acre Lake Mead—by more than 16 feet in a year.

And here's the kicker: Data centers are being built faster than state water plans can be updated. Once contracts are signed, once buildings break ground, the water usage is locked in. But nobody's asking where the water is going to come from. A small-to-mid-sized data center is estimated to require about 300,000 gallons of municipal water per day, which is manageable. Mega-campuses like Project Matador and OpenAI's Project Stargate One in Abilene could draw millions. Millions of gallons. Per day. From municipal water supplies. In a state that's simultaneously passing multibillion-dollar programs to address water scarcity.

It's not just Texas. About two-thirds of data centers built since 2022 have been located in water-stressed regions, including hot, dry climates like Arizona. Why would you build water-intensive infrastructure in places that don't have water? Because that's where the power is. Or that's where the tax breaks are. Or that's where the land is cheap. Or that's where the fiber runs. Nobody's optimizing for water availability because water wasn't part of the planning spreadsheet until it was too late.

The Impossible Trade-Off

Here's the part that should make every CFO nervous. You can't optimize for everything. Physics won't let you. Using more water means data centers can avoid running electric cooling systems. Using more electricity lessens the water footprint but ups the power bill and causes more greenhouse gas emissions.

Pick your poison: burn through water to save electricity (but drain aquifers), burn through electricity to save water (but spike your power bill and carbon footprint), or use exotic coolants to save both (but risk forever chemical contamination). There is no fourth option. And the decision gets made at the facility design phase, years before the first GPU gets plugged in. Once you've built the cooling infrastructure, you're locked in.

The "Zero Water" Lie

Microsoft announced they're solving this with closed-loop, zero-water evaporation cooling that eliminates evaporative water entirely, reducing annual water use by more than 125 million liters per facility. Sounds great, right? Here's what they don't tell you.

"Zero-water" means the system is filled at construction and then recirculates the coolant continuously. Perfect closed loop. No evaporation. No refills. Unless something leaks. Or breaks. Or needs maintenance. Or the system needs to be drained for upgrades. Or any of the thousand things that go wrong in industrial cooling systems. Then you're back to needing water. Lots of it.

And they're currently testing this in Phoenix, Arizona, and Mt. Pleasant, Wisconsin, with operations expected in 2026. Notice the word "testing." Notice "expected in 2026." This isn't production. This is a bet that the technology will work at scale. And if it doesn't, you've got a multi-billion-dollar facility in the desert with no way to cool it.

Why This Matters (Even If You Don't Run a Data Center)

Remember that $4,499 graphics card from my last post? The one I couldn't afford because companies bought all the GPUs? Those companies are now facing a different problem. They bought the GPUs. They can't plug them in. And even if they could, they can't cool them without draining a town's water supply. This isn't a future problem. This is happening right now.

xAI is building Colossus 2 in Memphis, Tennessee, with over half a million NVIDIA GPUs. Half a million. In one facility. NVIDIA and OpenAI announced a partnership to deploy at least 10 gigawatts of NVIDIA systems, representing millions of GPUs, with the first gigawatt coming online in the second half of 2026. Ten gigawatts. That's not a data center. That's an entire power grid.

And all of it needs water. Lots of water. Water that's coming from the same municipal supplies that serve homes, farms, and businesses. When a data center uses 5 million gallons per day, that's 5 million gallons that isn't going to schools, hospitals, or households. In places like Arizona and Texas that are already water-stressed, this isn't sustainable. But the contracts are signed. The buildings are going up. The GPUs are being installed. And nobody's asking what happens when the water runs out.

The Pattern

Here's what I keep seeing in these interviews. Companies that bought GPUs they can't use, running at 30% utilization. Companies that built data centers they can't power, with chips sitting in inventory. Companies that designed cooling systems they can't sustain, draining aquifers faster than nature can refill them.

It's the same pattern at every level. Buy first, plan later. Scale first, solve constraints later. Move fast and break... well, in this case, move fast and break the water table. The AI infrastructure gold rush isn't creating value. It's creating a resource crisis that most people don't even know is happening.

What's Next

This post is about the physical constraints. Power and water. The stuff you can't hand-wave away with better software or smarter scheduling. But even if you solve the power problem and the water problem, you've still got a bigger question: Who actually owns all this infrastructure?

Because right now, nobody knows. Is it the CTO's budget? The infrastructure team? Finance? Security? Procurement? When the power bill spikes, who pays? When the water utility pushes back, who negotiates? When the utilization drops to 20%, who gets blamed? In the next post, we'll talk about the organizational problem. The one where companies spend hundreds of millions on AI infrastructure and nobody can agree on who's responsible for making it work.

Spoiler: It's usually the person who wasn't in the room when the decision got made.


The Bottom Line

Companies are building AI factories that need the power output of a small city and the water consumption of a medium-sized town. They're building them in places that don't have enough power or water to support them. They're making trade-offs between water usage and electricity consumption without understanding the long-term implications. And they're doing all of this for infrastructure that runs at 30% utilization.

The math doesn't work. The physics doesn't work. The planning doesn't work. But the GPUs keep getting ordered. The buildings keep going up. And the water keeps evaporating. Meanwhile, I'm still gaming on my old card. Because the new ones are sitting in a data center somewhere, waiting for power and water that may never come.

Comments

Popular posts from this blog

I Took My Own Advice in an Interview. Pure Storage Didn't Flinch.

If I do the homework, you owe me a phone call. The death of decency in hiring.

The One Question That Terrifies Candidates But Wins Offers - It's not "How's the Culture?"