Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.
Is this because of the tok/s? Since it's pretty easy to run up a $5k bill in API usage for Claude/ChatGPT in a month.
Yes, because of the limits on tok/s, and you have to compare apples to apples, not Gemma 27B to Opus 4.7.
Assuming the local models get the job done (e.g., you adjust your workflow so that you can run the local machine 100% all the time, or whatever), then the time to payback isn't very high. MSRP for a 128GB AMD was $1400 at launch. That's 7 months of claude code subscription. If you assume a 5 year depreciation cycle, you can buy a cluster of 8 such machines and still come out ahead. (Power is a few hundred watts per machine peak -- maybe 7 machines if you include electricity.) Of course, I'm assuming non-bubble numbers. Those boxes are like $3K now. Still, a normal person would probably not buy 8 of them at once. Instead, they'd space out buying a machine every few years as the technology improves.
For me, things are getting better faster than my ability to review / trust the resulting code, so tok/sec isn't a bottleneck anymore. Instead, quality of the tokens is the bottleneck. That points to me wanting a 1TB DRAM iGPU once they're available at pre-bubble RAM pricing.
You're comparing the highest tier Claude subscription to something Qwen3.5-122B-A10B running locally, apples to oranges.
If you compare to a smarter US model like Grok 4.3, $1400 will pay for 560M output tokens, which at ~25 t/s locally using it nonstop for 8 hours a day would take two years to pay back. Not accounting for bubble prices or electricity.
Is the goal maximum t/s?
According to openrouter, Opus 4.8 is 128 t/s. So 10x faster than my antirez/ds4.
The value of not having a reliance on a third party company, and not needing an internet connection, and having total privacy: ∞
Just have to put some numbers on privacy and autonomy. What's the fine to my company if I get hacked and leak all my customer's PII? What's the cost in productivity lost if OpenAI/Anthropic/Google decides to suspend my account for an unknown reason?