On AI: Environmental Impact

Earlier in 2025, I started to poke around and experiment with using LLM-based AI.

My purposes have largely been educational. While I absorbed the basics of machine learning during my general computer science education, I have never engaged directly with the topic and have not studied it deeply. I'm unlikely to ever take on a role where I am at the bleeding edge of specializing and training LLMs for specific tasks.

I was seeing credible reports of the best LLM tools becoming more and more capable, and I wanted to build an intuitive feel for what they could and couldn't do, so my focus was primarily on using available models for inference within the nascent tooling environment.

Initially, I wanted to ensure that I had a workflow that I could run myself. If I was going to change the way that I approached building software, I didn't want my new tools to be subject to evaporating VC funds pulling the rug out.

Over the course of the past year, the frontier models from large firms like OpenAI and Anthropic have maintained a comfortable lead in capabilities over open-source and open-weight models like llama, deepseek-r1, and qwen. As befits a life-long vim user, most of my usage has been via "cli-agent" tools Claude Code and Codex, rather than AI oriented editors like Cursor.

There are a number of objections to the AI industry's behavior and to the usage of LLMs in general. There are some practical objections, but the most serious ones are ethical. If I was going to abstain from what seems like a transformative technology in the case of AI coding agents, I wanted to make sure the reasoning had solid ground. If I was going to use them, I wanted to have open eyes about the drawbacks.

These are the major concerns with AI technology as I understand them:

LLM training and inference costs cause unacceptable environmental damage and accelerate climate change
LLMs are "plagiarism" machines, and rather than being "creative" they are only capable of regurgitating the works of authors and artists in their training set
LLM reliance will erode human skill
LLMs produce bad output identifiable by unfavored characteristics, ie slop
LLMs appear to users as unbiased arbiters of information but in fact reflect the biases of their training set and can be manipulated to have other biases
Hallucination makes LLM usage unsuitable or even dangerous for too many tasks
AI companies are developing the singularity, which has a high probability of resulting in widespread human suffering

The first point in the list above is the focus of most of this post. Environmental impact was my biggest concern regarding LLMs and the whole AI industry, and that's where I'm going to spend most of my time.

I might revisit the rest of these issues in more detail in future, but I wanted to touch on them very briefly before getting into the environmental claims.

The "plagiarism" aspect to AIs feels less problematic for coding models, as the open source community has produced enormous volumes of high quality code under permissive licenses for which training would almost certainly be considered a valid use.

The piracy around training is yet another example of the tech industry ignoring laws (eg. uber, airbnb) with the seemingly accurate conclusion that if they are successful enough, they will evade severe punishment and the laws will be updated to make their lawlessness legal. This feels like real damage; history suggests that when laws only bind the poor, the inevitable reckoning is unpleasant.

The dehumanizing way AIs produce cheap imitations of the existing personal or house style of artists of various kinds is also a source of a lot of personal disgust. However, this doesn't seem as relevant to the programming agent context.

Likewise, with respect to programming, I'm cautiously optimistic that the erosion of skill will be an empty threat from skeptics and an empty boast from boosters. Some engineering skills will inevitably lose their value, but I refuse to become the kind of graybeard who thinks anyone who doesn't perfectly replicate my arc is some worthless know nothing. We've gone through cycles where skills considered foundational become irrelevant before. I never learned how to use a slide rule.

Part of why I think this is that last year of AI improvement has been evolutionary and not revolutionary. The only people who still seem to strongly believe that this line of development is going to lead to super-intelligence are relying on that story to scare you or have a net worth that is largely reliant on the AI bubble not popping. We might see some more surprising emergent behaviors from LLMs, but it just feels to me like there is still too much missing for them to eclipse human intelligence.

Environmental Impact

To determine the environmental impact of providing an AI product like ChatGPT, you have to look at the whole lifecycle, from development to delivery. It is useful to break this down into several stages, each with its own environmental impact that may be amortized over the lifetime of the subsequent stages.

The first, increasingly, is building a new datacenter.

Most pre-LLM large datacenters were built by large cloud computing providers like Google, Microsoft, and Amazon. Cloud computing offerings provide a mixed set of capabilities, with focus split across storage, memory, and compute, which is an industry term for metered processor time.

AI tasks perform poorly on classic compute platforms and require specialized hardware.

Currently, the bleeding edge compute platform for training AI is NVidia's CUDA. Although NVidia is known as a video card maker, video cards are in fact massively parallel CPUs with very fast memory, and CUDA turns them into general purpose computing platform. The capabilities of this platform has led to two massive supply crunches in the past decade, the first for generating cryptocurrency, and now for AI training. These have made NVidia the most valuable company in the world.

In China, where export restrictions have limited access to NVidia's hardware, alternatives are being developed by several big tech companies, including Huawei and Baidu, whose Kunlun processors aim for CUDA compatibility.

Utilizing the CUDA resources of several established datacenters and distributing AI workloads geographically doesn't make sense, because AI training is data intensive and regional bandwidth is expensive. In a very real sense, space is time, and time is money, so AI companies are building out new datacenters at breakneck speed.

xAI built its Colossus datacenter in Memphis, Tennessee in 122 days. OpenAI, along with Oracle and Softbank, have already selected 5 sites for their $500bn Stargate initiative, which is going to add 10GW of capacity. These are among the largest datacenters being built, but according to Bloomberg, around 160 AI datacenters have been built in the US in just the last 3 years.

Like any new construction, these projects impact the local environment, sometimes altering or damaging the specifics of the local ecosystem. They also result in carbon emissions and other macro environmental costs like heavy metals pollution via the concrete and the production of the computing equipment they house.

The second cost is the cost of running the datacenters.

If you consider Stargate, if the 10GW of power is not clean energy, then it comes with a significant carbon cost, provided that the aging US power grid can even adapt to these kinds of demand increases.

Much of the power going into the datacenter will be turned into heat, which must be dissipated. Although heat pumps like traditional air conditioners are very power efficient, evaporative cooling is about 4x more power efficient. Since water is typically cheaper than electricity, this makes consuming water to dissipate heat more cost effective, too.

This is where the claim that "AI datacenters consume vast amounts of water" comes from, although this problem is not limited to AI specific datacenters. In reality, modern datacenter cooling involves a sophisticated collaboration among several cooling technologies, but evaporative cooling is used even in humid areas because of its cost efficiency.

For practical reasons, the costs of running the datacenter is often split up into two stages, training models and running inference, which is similar in characteristics to the building vs the running of the datacenter itself.

To simplify drastically, training is the process of creating a new model. The names you see like GPT-5 from OpenAI, Gemini from Google Deepmind, Grok from xAI, or Sonnet and Opus from Anthropic; these are all either models or series of models. Although training involves an iterative R&D process, once the model is delivered, the cost of training has more or less already been paid.

When you send a query to ChatGPT or ask Claude Code a question, that query is eventually submitted to a model. The response you get is a result of a process called inference, but it's close enough to think of this as the cost of "execution" of a query against a particular model.

Training costs are exceptionally high and it is among the most compute intensive tasks in modern computing. Training can take millions of hours of GPU time on very expensive NVidia boards. It's estimated that training GPT-5 used ~50-100 thousand NVidia H100 class GPUs for 3 months. At a typical rental cost of ~$2-3/hr this represents a market rate cost of ~$250-500m. According to MIT Technology Review, training GPT-4 used 50MWh of energy, enough to "power San Francisco for 3 days."

Despite this, the cost of inference seems to be outpacing the cost of training as the scale of AI usage increases. Even though the cost of each query is very low, if there is a lot of usage over the lifetime of the model, inference costs will eventually overtake training costs. In their paper How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference, Jegham et. al. claim:

Recent estimates suggest inference can account for up to 90% of a model’s total lifecycle energy use

Jegham et. al go into some detail on their model for estimating power use, water use, and carbon emissions per model based on a variety of query types. Additional factors that are difficult to track and impossible to control for can also impact water and carbon intensity, such as the time of day of a query.

A team at University of Michigan led by PhD candidate Jae-Won Chung has attempted to control for this, running open models in a controlled conditions on several popular hardware platforms. Their ML.energy Leaderboard suggests that energy usage is fairly strongly correlated with parameter size, which is somewhat expected, and they also provide detail with different tasks such as coding, image generation, and video generation. Their power figures are given in joules, so keep in mind that a watt is 1J/sec.

So, how much energy is actually being used?

I'd guess that to a lot of people, hearing something like "training GPT-3 used up enough water to fill an 2/3rds of an olympic sized swimming pool" or "enough energy to power San Francisco for 3 days" sounds like a lot. Those swimming pools are 2.5 million liters! If you filled an office water cooler jug and dumped it into an olympic swimming pool once per minute, it would take you 90 days to fill it.

I've been ~~cursed~~ blessed with the experience of building computing platforms at something approximating global scale, and these numbers seemed kind of low given the alarm that's been risen. It used less water than you would to fill a single olympic pool? Between Florida and California, there are roughly 3 million swimming pools. They used 1% of the yearly energy costs of a city representing 0.25% of the US population? Is that really a crisis?

Jegham et. al. elaborate about how these numbers can compound to become more alarming. They go into detail on the GPT-4o model, where they found that a short query uses roughly 0.42Wh. The authors compare this to the cost of a Google Search, estimated at 0.30Wh. This is a comparison that AI proponents like to make, but the authors go on to claim that medium sized queries consume ~10Wh, and their conservative estimates put the small to medium split at 80/20.

A big reason for this kind of disparity is that LLM queries have a much bigger variance in complexity than search queries do. An LLM query is frequently an entire paragraph, and coding tools carry with them a fairly large context window that allow the AI platform to understand the current query with some knowledge of its recent inputs and outputs.

While the academic community is doing important work coming up with methods of estimation and classification for LLM environmental impact, they don't have the actual numbers from running such a system at their disposal. Their work is akin to establishing a kind of AI Drake Equation whose structure is valuable but whose component values are not known.

So it's interesting to compare their results to what we get from the major industry players, such as those presented by Google themselves in their Aug 2025 paper Measuring the environmental impact of delivering AI at Google Scale:

[W]e find the median Gemini Apps text prompt consumes 0.24 Wh of energy—a figure substantially lower than many public estimates. [..] Google’s software efficiency efforts and clean energy procurement have driven a 33x reduction in energy consumption and a 44x reduction in carbon footprint for the median Gemini Apps text prompt over one year.

In their summary post, they elaborate:

We estimate the median Gemini Apps text prompt uses 0.24 Wh of energy, emits 0.03 grams of carbon dioxide equivalent (gCO2e), and consumes 0.26 milliliters (or about five drops) of water.

Citing similar figures but with no explanation of methodology and no further details, Sam Altman claims figures in the same ballpark for ChatGPT in his post "The Gentle Singularity" in June, 2025:

People are often curious about how much energy a ChatGPT query uses; the average query uses about 0.34 watt-hours, about what an oven would use in a little over one second, or a high-efficiency lightbulb would use in a couple of minutes. It also uses about 0.000085 gallons of water; roughly one fifteenth of a teaspoon

Google's has a long history of transparency on providing environmental impact studies. I would imagine they are on the high end of the datacenter energy and water efficiency spectrum compared to the rest of the industry, or their competitors would be crowing of their own accomplishments. Their development and deployment of custom TPUs would, in theory, also help make their AI workloads more efficient.

The best numbers I can find estimates that, in the US, datacenter CO2 output across the entire industry is similar to that of the commercial airline industry, each accounting for ~2-2.5% of total carbon emissions. While AI usage is only a portion of that, we can expect its share within datacenters to grow, given the rapid rate of expansion.

The AI ecosystem is rife with fantastic claims from techno-optimists for whom developing more AI at an ever increasing rate is in fact the answer to these problems. They say:

The inevitable AI super-intelligence will help us organize the entire carbon economy more efficiently, ultimately leading to a reduction.
Market forces will push development in TPUs or other AI dedicated hardware, leading to efficiency gains that will flatten the curve of AI energy usage.

As with neoliberal economics, I'm always suspicious of people who claim that the answer to the damage their self serving behavior is causing is actually more of that behavior. I don't see much evidence that either of these outcomes are likely.

On the first point, we have a metric for measuring datacenter energy efficiency, called PUE, which essentially measures how much power goes into computation versus other datacenter requirements like cooling. The best possible PUE is 1.0, meaning all energy goes towards computation.

The best datacenters in the world have PUE's of ~1.04, with the ESIF from the National Renewable Energy Laboratory reporting a PUE of 1.036. Google's fleet-wide PUE has gone from 1.24 to 1.09 between 2009 and 2025, with a fleet-best datacenter registering a PUE of 1.04. There simply isn't any juice left in that stone for a super-intelligence to optimize.

But perhaps it can help with the second claim?

On the one hand, the AI chip and accelerator market is a booming industry that already has more unicorns than a child's bedroom. Among the new players are some slightly more established companies like Groq and Cerebrus, as well as Jim Keller's latest venture Tenstorrent.

As of yet, none of these companies have been able to show significant gains over an H100 in a piece of hardware whose manufacturing process scales out to something like the 200,000 units that xAI has ordered from NVidia for Colossus. A handful have managed to make a different set of tradeoffs to see some low multiples (eg 4x, 8x) over existing technology, but at significant upfront costs, either in hardware that is more expensive or more difficult to program, or in ASICs that have to be re-programmed for every new model.

My conclusions from this reading:

AI's ecological impact is more modest than reported, especially by its common characterization on social media, but its rapid growth makes it a problem worth focusing resources on.
There are well aligned cost incentives for AI companies to reduce power usage and they typically run very efficiently based on their PUE.
There isn't as strong an incentive structure for water.
AI ecological impact is often described in alarmist ways, like enough to power 29,000 U.S homes for a year, which is "about as much as Jonesboro, Ark.", a place I've never heard of with fewer than half of the number of people as the Upper East Side in Manhattan. Regarding water, Google's entire fleet of datacenters used approximately 16.7 billion liters in 2021, which sounds like a lot, but as they explain, that's equivalent to the water use of 29 golf courses in the American Southwest.
The problem of water usage is highly situational, being no-big-deal or a truly alarming disaster waiting to happen depending on local water conditions. Surface water is consistently replenished by the water cycle, but many datacenters are built in locations that use aquifer water, which fills up on longer timescales and could leave nearby communities without enough water for people to use. Water scarcity is likely to become a major problem in the coming century. It's not clear that datacenters are particularly wasteful compared to other industrial practices, but they are an accelerant in creating this crisis in many of the places they are being built.
AI inference environmental impact is still poorly understood, but it seems likely that the academic estimates that form the groundwork of most public information on co2/water consumption are over-estimates based on a lack of empirical data.

A final, parting conclusion:

The ethical carve-out I give AI for the task of programming might simply be a form of apologetics. I believe the things I wrote, but I'm truly not sure if I'm just trying to justify my mild interests to myself.

I take the ethical concerns on AI to heart. .The power of LLMs for producing software alone doesn't justify the valuations or the amounts of money being poured into them; their backers clearly want to produce something greater than that, and I think that greater thing, if they can get there, is transformative in a bad way and represents a great danger.

AI in its current form feels like a very valuable tool for certain applications; its ability to assist in scientific research, its uses in medical diagnostics, its uses in engineering, all seem very promising. These are niches where we have just not been able to train enough people to do this kind of work, where they will be guided and steered by human experts, and where their outputs can be checked for hallucinations and verified.

It's sad to me that this is inexorably tied to something that many people hate reflexively, backed up by hyperbolic nonsense where a fuller and more nuanced understanding still provides sufficient justification.

Dec 20

jmoiron plays the blues

On AI: Environmental Impact

Environmental Impact