My thoughts.

Current gen AI is going to result in the excess datacenter equivalent of dark fiber from the 2000's. Lots of early buildout and super investment, followed by lack of customer demand and later cheaper access to physical compute.

The current neural network software architecture is pretty limited. Hundreds of billions of dollars of investor money has gone into scaling backprop networks and we've quickly hit the limits. There will be some advancements, but it's clear we're already at the flat part of the current s-curve.

There's probably some interesting new architectures already in the works either from postdocs or in tiny startups that will become the base of the next curve in the next 18 months. If so, one or more may be able to take advantage of the current overbuild in data centers.

However, compute has an expiration date like old milk. It won't physically expire but the potential economic potential decreases as tech increases. But if the timing is right, there is going to be a huge opportunity for the next early adopters.

So what's next?

If the end result here is way overbuilt energy infrastructure that would actually be great. There’s a lot you can do with cheap electrons.

I suspect it will mostly be fossil power capacity, which is much easier to scale up

I wouldn’t be so sure about that. Serves of the big names in this space have green energy pledges and are actively building out nuclear power.

Nobody is actively building out nuclear power. Microsoft is turning on a recently decommissioned facility.

New nuclear is too expensive to make sense. At most there are small investments in flash-in-the-pan startups that are failing to deliver plans for small modular reactors.

The real build out that will happen is solar/wind with tons of batteries, which is so commonplace that it doesn't even make the news. Those can be ordered basically off the shelf, are cheap, and can be deployed within a year. New nuclear is a 10-15 year project, at best, with massive financial risk and construction risk. Nobody wants to take those bets, or can really afford to, honestly.

Plenty of bets being placed on nuclear, but they are moonshot style bets.

From where I'm standing, the immediate capital seems to be being deployed at smaller-scale (2-5MW) natural gas turbines co-located on site with the load. I haven't heard a whole lot of battery deployments at the same scale.

Of course turbines are now out at 2029 or something for delivery.

Only marginally at the edge of this space these days though, so what I hear is through the grapevine and not direct any longer.

As far as the grid goes, there's a tiny bit of gas additions, but it's mostly solar, battery, and wind:

https://www.eia.gov/todayinenergy/detail.php?id=65964

Of course, remember that nameplate capacities from different technologies should be corrected for capacity factor, which is roughly 60% for gas, 40% for wind, and 25% for solar, but pre-correction EIA expects

    solar: 33.3GW
    battery: 18.3GW
    wind: 7.7GW
    gas: 4.7GW
And then there's an expected retirement of 1.6GW of old gas this year.

I'm pretty disconnected from the data center folks, but in general the current political environment is highly disfavorable to solar and batteries, and using them too much could have lots of political blowback that is very expensive.

Of course, small gas also has the benefit that the operating costs are spread over the lifetime, rather than being an up-front cost. So even if solar+batteries is cheaper than gas over the lifetime of the system, gas may seem more expedient if you don't want a lot of capital on the books.

> The real build out that will happen is solar/wind with tons of batteries

That actually sounds awesome, is there a downside I’m not seeing?

If you're a utility, you may not like that solar and batteries are driving down electricity costs and reducing grid expenses. But even with the thumbs against the scale, we are seeing the most nameplate deployment (see caveats in my parallel reply) in decades, and will likely set a record, because of solar, batteries, and wind in that order:

https://www.eia.gov/todayinenergy/detail.php?id=65964

There are a couple companies doing HTGR and SMRs that seem to be on track.

> There's probably some interesting new architectures already in the works either from postdocs or in tiny startups

It is not clear to me why we will have a breakthrough after virtually no movement on this front for decades. Backpropagation is literally 1960s technology.

Your comment sort of implies that all this is some super standardized flow that is well studied and highly optimized but in my experience all this ML stuff is closer to the edge of broken than some kind of local maximum.

There is an ungodly number of engineering decisions that go into making ML work and any number of stupid things are all over the place that cause stuff to fail.

Like something stupid like your normalization was bad or your mix of data was bad or your learning rates were bad or you have some precision issues or your model has a bad init or some architectural problems cause poor training or straight up there are tons of bugs somewhere like your batching was doing something silly or there is some numerically unstable division or sqrt or somewhere etc etc.

At scale with stupid issues like hardware faults I imagine this only gets exponentially worse.

And then on product sides of integrating stuff more bugs sneak in like so many labs were releasing so many open source LLMs with broken and incorrectly configured chat templates that massively tanked performance.

Or they set up some parmeters in sampling wrong and stuff gets stuck in loops or hallucinates tons or something.

In his 2025 hotchips keynote Noam Shazeer (GDM VP) even says that you need hardware determinism because there are just so many bugs in ML experiments that you need to be able to tweak and test things.

Also there are just so many obvious issues with the way everything works conventionally in GPT2 style like with softmax causing attention sinks at punctuation and creating dispersion over longer sequences because of low sharpness and the whole previllaged basis thing making it so common information takes up a lot of model capacity.

I'd like to add to this that in the recent Y combinator podcast with Anthropic head of pretraining the bugs issue is brought up as a major issue[1].

It is so easy to have good ideas broken by random bugs everywhere...

[1] https://youtu.be/YFeb3yAxtjE?t=2919

Because tremendous rewards will spur a huge increase in research?

This. I totally agree we will see better architectures for doing the calculations, lower energy usage inference hardware and also some models running on locally moving some of the "basic" inference stuff off the grid.

It's going to move fast I think and I would not surprised if the inference cost in energy is 1/10 of today in less than 5 years.

This said Jeavons paradox will likely mean we still use more power.

Hopefully there's a flood of good cheap used Supermicro and other enterprise gear and maybe a lot of cheap colo.

Yes, that makes sense.

Also adding to that tendency, I suspect as the tech matures more and more consumer space models will just run on device (sure, the cutting edge will still run in server farms but most consumer use will not require cutting edge).

This is one possibility I'm assuming as well. It largely depends on how long this bubble lasts. At the current growth rate it will be unsustainable before many very large DCs can be built so it's possible the impact may not be as severe as the telecom crash.

Another possibility is that new breakthroughs significantly reduce computational needs, efficiency significantly improves, or some similar improvements that reduce DC demand.

It's a line (remindme! 5 years)