Hacker News

LLMs that you can run locally on hardware that is not out of range to acquire is already a thing for some time.

Recently I fired up Gemma4-26B-A4B on my 8-year-old PC... and it ran surprisingly well!

But I am going to need a much beefier machine to get it to the point where it can do any but very trivial dev tasks acceptably fast, and I'm going to need a much beefier model, perhaps one not so aggressively quantized, to keep it on task without the wheels completely falling off. Already we're talking serious money outlay, perhaps still within my programmer salary to accommodate, but just barely. And we're not even where near the performance characteristics a frontier model can support.

verdverm 16 hours ago [ - ]

DGX Spark runs this sized model (I personally like qwen36moe better than gemma4moe) at speeds fast enough for interactive coding sessions. Algorithmic advances like DiffusionGemma ~4x token gen speeds (https://deepmind.google/models/gemma/diffusiongemma/)