Hacker News

On-device agentic use is orders of magnitude harder than simple chatting (which is still slow for SOTA), it uses up a huge amount of context and tokens on reading code and reasoning through it. It's sort of viable if you just set it to work overnight on some completely vibe-coded stuff, but that has very middling results. Giving feedback to the model interactively is completely out of the question.

Where open models can make a difference for agentic use is with third-party inference at scale, which can actually be fast enough for reasonable workflows.