The agent cannot compose MCPs.
What it can do is call multiple MCPs, dumping tons of crap into the context and then separately run some analysis on that data.
Composable MCPs would require some sort of external sandbox in which the agent can write small bits of code to transform and filter the results from one MCP to the next.
This is confusing to me. What is composability if not calling a program, getting its program, and feeding it into another program as input? Why does it matter if that output is stored in the LLM's context, or if it's stored in a file, or if it's stored ephemerally?
Maybe I'm misunderstanding the definition of composability, but it sounds like your issue isn't that MCP isn't composable, but that it's wasteful because it adds data from interstitial steps to the context. But there are numerous ways to circumvent this.
For example, it wouldn't be hard to create a tool that just runs an LLM, so when the main LLM convo calls this tool it's effectively a subagent. This subagent can do work, call MCPs, store their responses in its context, and thereby feed that data as input into other MCPs/CLIs, and continue in this way until it's done with its work, then return its final result and disappear. The main LLM will only get the result and its context won't be polluted with intermediary steps.
This is pretty trivial to implement.
Give the model an interpreter like mlua and let it write code to compose MCP calls together. This is a well established method.
It’s the equivalent to calling CLIs in bash, except mlua is a sandboxes runtime while bash is not.
At the level of the agent, it knows nothing about MCP, all it has is a list of tools. It can do anything the tools you give it let it do.
It cannot do "anything" with the tools. Tools are very constrained in that the agent must insert into it's context the tool call, and it can only receive the response of the tool directly back into its context.
Tools themselves also cannot be composed in any SOTA models. Composition is not a feature the tool schema supports and they are not trained on it.
Models obviously understand the general concept of function composition, but we don't currently provide the environments in which this is actually possible out side of highly generic tools like Bash or sandboxed execution environments like https://agenttoolprotocol.com/
They can already do this, no? MCPs regularly dump their results to a textfile and other tools (cli or otherwise) filter it.