Even with all of the CLI tools at its disposal (e.g. sed), it doesn’t consistently use them to make updates as it could (e.g. widespread text replacement). Once in a blue moon, an LLM will choose some tool and use it in a way that they almost never do in a really smart way to handle a problem. Most of the time it seems optimized for using too many individual things, probably both for safety and because it makes the AI companies more money.

It's because the broader the set of "tools" the worse the model gets at utilizing them effectively. By constraining the use you ensure a much higher % of correct usage.

There is a tradeoff between quantity of tools and the ability of the model to make effective use of them. If tools in an MCP are defined at a very granular level (i.e. single API calls) it's a bad MCP.

I imagine you run into something similar with bash - while bash is a single "tool" for an agent, a similar decision still need to be made about the many CLI tools that are available from enabling bash.