> using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway

Now that’s a cursed take on power efficency

efficiency is just a mindset. if i save 3 seconds of my own attention by burning 300 watts of gpu, the math works out in my favor!

"works out in my favor" is a pretty poor metric.

If I burn a billion tons of someone else's coal to make myself a paperclip (and don't have to breathe the outputs) it works out in my favor too.