The general consensus is that local models will continue to improve drastically, but hosted models will as well. There will _always_ be a pretty big gulf of capability between what you can do with a desk full of hardware at home vs a few racks of hardware in a datacenter. That seems to be the real "moat" of hosted models at this point in time: access to capital.
What's interesting/exciting is that local models are _already_ quite good at tasks we never imagined AI _ever_ doing before ChatGPT hit the scene just a few short years ago.
We're also in an interesting point in time where companies are releasing the fruits of their research/labor (the LLMs) to the general public for free. For now, I think they see it in their best interest to gain mindshare and rapport, as well as advancing the state of the art in smaller LLMs ("a rising tide lifts all boats") but I fear and expect that these will dry up as the major players buy the minor players, and all will seek a return on their considerable investments in AI research.
I believe there's a level of diminishing returns. Sure, SOTA will probably always benchmark better than local models. But do we need it? That's the question that the likes of OpenAI and Anthropic should be worried about.
The difference won't be in the individual tasks. It'll be in the scale of job they can take on and how you interact with the model. Think of pairing with a junior vs replacing a full delivery team, that's the sort of difference we'll be looking at. We'll be able to get closer to the latter by being more clever with harnesses, I reckon, but the frontier labs will run ahead because for any given harness trick they can lean harder on model smarts.
True, but my point is that if/when local models get to the point where they are capable of doing the "delivery team" work what's next? What can these bigger SOTA models offer? And especially what can they offer above and beyond what you might be able to get from much cheaper models which the open models are based on?
That's what I mean by diminishing returns.
There is also the thing of workflow.
We have set up something where you create a ticket, Make sure it contains enough information, and with the right tag added it will make a branch with PR for you which stays up to date based on updates to the ticket and comments on the PR.
It’s creepy in a way. But you also can’t really use local (as in workstation LLM) for that. Sure we could run something like a distributed task scheduler across all our engineer devices but just pushing it to copilot is easier.