Hacker News

LLM inference uses on the order of 1 Wh per query. That's under 10 meters of driving on an EV or running air conditioning for under 5 seconds.

One query is not going to be a useful benchmark when people are deploying AI swarms in loops to solve simple problems

Or a human riding a stationary bike for 36 seconds.