It's so interesting to think about how much power it takes these machines to "think". I think I had a vague notion that it was "a lot" but it's good to put a number on it.

If DS4 Flash peaks at 50W and is 280B parameters, does that mean DS4 Pro at 1.6T parameters would likely be 300W or so? And the latest GPT 5 and Opus which feel maybe comparable-ish around 500W? Is it fair to say that when I'm using Claude Code and it's "autofellating" or whatever I'm burning 500W in a datacenter somewhere during that time?

There isn't a relationship between parameter size and energy use like that. You could run a 280B parameter model on a Raspberry Pi with a big SSD if you were so determined. The energy use would be small, but you would be waiting a very long time for your response.

Data center energy use isn't simple to calculate because servers are configured to process a lot of requests in parallel. You're not getting an entire GPU cluster to yourself while your request is being processed. Your tokens are being processed in parallel with a lot of other people's requests for efficiency.

This is why some providers can offer a fast mode: Your request gets routed to servers that are tuned to process fewer requests in parallel for a moderate speedup. They charge you more for it because they can't fit as many requests into that server.

You're thinking about power use, not energy. There are systems that can more directly minimize energy per operation at the cost of high latency but they look more like TPUs than Raspberry Pi's.

Energy use for any given request is going to be roughly proportional to active parameters, not total. That would be something like 13B for Flash and 49B for Pro. So you'd theoretically get something like 190W if you could keep the same prefill and decode speed as Flash, which is unlikely.

Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely

Power isn't proportional to parameters. It may be vaguely proportional to tokens/s although batching screws that up.

Claude Sonnet is probably running on a 8 GPU box that consumes 10 kW while Opus might use more like 50 kW but that's shared by a bunch of users thanks to batching.