I would love to see an llm system oriented around trying to "jit" out fast/cheap combinations of tool invocations from calls. The case where heartbeat calls Opus, where the transformer model itself doesn't know the time but has access to some tools, should be solvable if Opus has said the equivalent of "it's still night, and to determine this I checked the time using the `date` utility to get the epoch time and if it had responded something greater than $X I would have said it's morning". I.e. the smart model can often tell a less capable framework how it can answer the same question cheaply in the future. But incentives aren't really aligned for that presently.

Well, you can have the less capable system ask the more capable system to give it a tool to solve the problems it doesn't know how to solve, instead of solving it. If the problem really needs the more advanced system, that tool might call a model.

Or tell you how to use the tools you already have, but sure whatever. The point is, "LLMs that actually do stuff" end up partly looking like expensive logic to decide what to do (like an interpreter) and actually doing it, in terms of some preexisting set of primitives. The expensive logic does get you a high degree of flexibility for less programming cost, but you pay a lot for interpretive overhead. I really do think something that looks like a tracing hit (and a standard set of tools or primitives) could find "straight line" paths a reasonable fraction of calls. But if the interpreter is a service that you pay for by the token, and which is already losing money, why should it be written to do so?

> Or tell you how to use the tools you already have, but sure whatever.

But that's missing the point of the writing tools example: The models can unilaterally do this, and if prompted properly they will do this today. Writing an agent loop that will encourage the agent to build up it's own library of tools and prefer that library first, using a small (possibly local) model to select, is literally no more than a couple of hundred lines of code.

I wrote a toy coding agent a while back, and I started just with the ability for it to run shell commands. Every other tool the model wrote itself. At some point it even tried to restart itself to get access to the tool it had just written.

The providers only control a miniscule proportion of tooling, and the models themselves are capable enough that they will never control most of the tooling.

As a result, a lot of effort is going into optimising this to reduce token use.

But what we're seeing is that the main driver for this is not to spend less, but to do more, because we get enough value out of these models that every cost saving measure makes it possible to at a minimum spend the same to do more, but often even to spend more.

I'm doing projects now that would be financially unviable just 6 months ago.