Don't see whats wrong in letting llm decide which tool to call based on a search on long list of tools (or a binary tree of lists in case the list becomes too long, which is essentially what you eluded to with sub-agents)

I was referring to letting LLM's search github and run tools from there. That's like randomly searching the internet for code snippets and blindly running them on your production machine.

For that, we need sandboxes to run the code in an isolated environment.

Sure to protect your machine, but what about data security? Do I want to allow unknown code to be run on my private/corporate data?

Sandbox all you want but sooner or later your data can be exfiltrated. My point is giving an LLM unrestricted access to random code that can be run is a bad idea. Curate carefully is my approach.

For data security, you can run sandbox locally too. See https://github.com/instavm/coderunner