Hacker News

It has everything to do with the way you use it. And the biggest difference is how fast the model/service can process context. Everything is context. It's the difference between you iterating on an LLM boosted goal for an hour vs 5 minutes. If your workflow involves chatting with an LLM and manually passing chunks, and manually retrieving that response, and manually inserting it, and manually testing....

You get the picture. Sure, even last year's local LLM will do well in capable hands in that scenario.

Now try pushing over 100,000 tokens in a single call, every call, in an automated process. I'm talking the type of workflows where you push over a million tokens in a few minutes, over several steps.

That's where the moat, no, the chasm, between local setups and a public API lies.

No one who does serious work "chats" with an LLM. They trigger workflows where "agents" chew on a complex problem for several minutes.

That's where local models fold.