I'm not sure I understand this argument. I create new tools all the time as part of my development work, and I have skills stored that tell agents how to use them. They use them flawlessly.

When I say "benchmark the query engine using the foobar dataset and compare it to run 431", the agents go and run my special benchmark tool and use the different subcommands to compare results and so on.

I'm sure a new VCS would be a little less smooth sailing, but not by much.

I think the issues is, it is going against a very well established pattern. I have a tool that wraps ripgrep so that search results always includes context and from time to time, the agent will use ripgrep by itself and when I ask why, it would go "yeah I should have done that"

There are work arounds though and I am creating what I call knowledge triggers for Pi that are similar to claude's "PreToolUse" so having the agent use oak all the time is not an issue in my opinion.

The challenge for oak is why? Considering how I actually want to slow agents down so I can ensure it is doing the right thing and because the massive bottle kneck is the LLM themselves, speed when measured in milliseconds or even seconds will not concern many.

I thought oak was more of, we know how to prompt inject context based on code that is stored in oak for example, but faster operations can help, but the use case is limited. The missing piece for better/correct code is context at the right time.