> Models know git because there's a monstrous amount of git in their training data. Models never heard of a new thing "for agents", so you have to teach them to use it via skills and docs.

Another option: when model invokes standard tool, rewrite the invocation to newfangled tool.

Bunch of ways of doing it:

(a) Invocation of standard tool returns error saying to use newfangled tool instead

(b) Invocation of standard tool returns message saying it has been dynamically rewritten to invoke newfangled tool, followed by newfangled tool output

(c) Invocation of standard tool in context is dynamically rewritten to invocation of newfangled tool, prior to execution

In case (c), the model ends up thinking it somehow knew about this new thing all along, even though it actually didn’t

Options (a) and (b) add more bloat to the model’s context window and option (c) seem to reduce to having similar functions that already existed. There is also the option to trick the LLM that it’s using the old function exactly as-is, while the harness abstracts away a completely different methodology. Cursor often does exactly this: they use an internally built vectorized search when the model calls the default “find” bash command. The LLM is none the wiser that the function’s implementation is completely different.

Regardless, in any of these cases, the implementation for any of these above options may be vastly superior to the “naive” implementation for agents — but then the parent comment here is right that an engineer would need to justify their implementation to users, not just make a loud conjecture. It’s a non-trivial claim to say that a bespoke solution not present in tool-use training and accounting for context-rot would result in a better performing model. Moreover, justifying an agent-specific efficiency gain that humans wouldn’t benefit from makes the claim even more non-trivial. Using Sagan’s razor, it’s then reasonable for people to ask for a comparably non-trivial amount of evidence.