Hacker News

I've always wondered: Doesn't the fact that the MCP input/output is more structured lead to higher reliability? With MCP you declare the types for input (string, int, list, etc) and output.

As part of our product, we have an MCP server. Since many of our MCP tools are expensive, for our tests we simply give the LLM all the tool descriptions (but in text form, not structured) and ask it which tool it would call for a given query and assert on the response.

The tests are flaky. In practice, I've always seen the LLM make the right tool call with the proper formatting of args, etc. In the tests (same LLM model), it occasionally makes mistakes on the argument types and it has to try again before it gets it right.

My assumption was that the structure MCP provides was the reason there was a discrepancy.