I've always wondered: Doesn't the fact that the MCP input/output is more structured lead to higher reliability? With MCP you declare the types for input (string, int, list, etc) and output.
As part of our product, we have an MCP server. Since many of our MCP tools are expensive, for our tests we simply give the LLM all the tool descriptions (but in text form, not structured) and ask it which tool it would call for a given query and assert on the response.
The tests are flaky. In practice, I've always seen the LLM make the right tool call with the proper formatting of args, etc. In the tests (same LLM model), it occasionally makes mistakes on the argument types and it has to try again before it gets it right.
My assumption was that the structure MCP provides was the reason there was a discrepancy.
This maybe one of the area that MCP are ok-ish, however at huge cost to context.
As I and others have pointed out: The context problem with MCP is mostly solved.
See https://news.ycombinator.com/item?id=47719249 for an example I gave.