A good MCP server makes the difference between an agent using 20k tokens and 2 million. It may not matter yet with sponsored Codex and Claude subscriptions, but it will kill many use cases once providers switch to token-based billing.
That may sound like an exaggeration, but it’s exactly what I see in our product.
Humans developing something already have context that agents don’t have yet. Most agents start a task with virtually no prior knowledge. And they start from zero every single time. That may improve in the future, but we’re not there yet.
Can agents get the job done? Yes. But without a thoughtfully implemented MCP server, they are awkwardly inefficient.
Seems to me that you're saying the MCP is a simplified API with good documentation geared towards agents. But if that's the case, could you not exposes the simplified interface as part of the API, instead of exposing it in MCP?
When it is part of the API, the agent still has the choice between multiple options. If it chooses the less efficient one, the request can become significantly more CPU- and token-intensive than necessary.
The problem is that the agent does not care. Its primary goal is to get the job done.
Maybe the agent is smart enough to choose the optimal path, but that strongly depends on the model being used. You also do not know who is on the other side. With a human-facing API, you can usually assume who is using it and what they want to achieve. Humans are generally lazy and tend to look for the most efficient solution.
An agent, however, will happily iterate through 1,000 users and fetch the online state for each one individually, even across multiple paginated requests if necessary.
You can provide an endpoint that returns the online states for all users at once. A human will most likely use that endpoint, but I have seen agents go completely wild on the other side. :D
At some point, you may get a response like “token limit reached.” But what do you do then? You give the agent more tokens and increase your bill, because you cannot even tell whether there was a more optimized way to achieve the same result.
In practice, this is a surprisingly tricky problem. :D