I could not agree any less with the author. I don’t want APIs, I want agents to use the same CLI tooling I already use that is locally available. If my agents are using CLI tooling anyways there is no need to add an extra layer via MCP.
I don’t want remote MCP calls, I don’t even want remote models but that’s cost prohibitive.
If I need to call an API, a skill with existing CLI tooling is more than capable.
I often just put direct curl commands in a skill, the agent uses that, and it works perfectly for custom API integrations. Agents are perfectly capable of doing these types of things, and it means the LLM just uses a flexible set of tools to achieve almost anything.
I think this is the best of both worlds. Design a sane API (that is easy to consume for both humans and agents), then teach the agents to use it with a skill.
But I agree with the author on custom CLI tooling. I don’t want to install another opaque binary on my machine just to call some API endpoints.
Obviously opaque binaries are hardly an improvement over MCP, but providing a few curl + jq oneliners to interact with a REST API works great in my experience. Also means no external scripts, just a single markdown file.
With a good CLI, an agent may be able to do something outside of the scope of it's skill fairly easily, by running help commands or similar. With even a well written API it is not as easy.
I suppose that curl + API docs could replace a CLI but that's really token inefficient
I keep getting hung up on securely storing and using secrets with CLI vs MCP. With MCP, you can run the server before you run the agent, so the agent never even has the keys in its environment. That way. If the agent decides to install the wrong npm package that auto dumps every secret it can find, you are less likely to have it sitting around. I haven’t figured out a good way to guarantee that with CLIs.
A CLI can just be a RPC call to a daemon, exact same pattern apply. In fact my most important CLI based skill are like this.. a CLI by itself is limited in usefulness.
That was the same conclusion I reached! However, this also gave me some evidence that maybe I wanted MCP? I realized that my pattern was going to be:
Step 1) run a small daemon that exposes a known protocol over a unix socket (http, json-rpc, whatever you want), over a unix socket. When I run the daemon, IT is the only that that has the secrets. Cool! Step 2) Have the agent run CLI that knows to speak that protocol behind the scenes, and knows how to find the socket, and that exposes the capabilities via standard CLI conventions.
It seems like one of the current "standards" for unix socket setups like this is to use HTTP as the protocol. That makes sense. It's ubiquitous, easy to write servers for, easy to write clients for, etc. That's how docker works (for whatever it's worth). So you've solved your problem! Your CLI can be called directly without any risk of secret exposure. You can point your agent at the CLI, and the CLI's "--help" will tell the agent exactly how to use it.
But then I wondered if I would have been better off making my "daemon" an MCP server, because it's a self-describing http server that the agent already knows how to talk to and discover.
In this case, the biggest thing that was gained by the CLI was the ability of the coding agent to pipe results from the MCP directly to files to keep them out of its context. That's one thing that the CLI makes more obvious and easy to implement: Data manipulation without context cluttering.
In other words, a wrapper around an MCP that's less verbose.
MCP is a wrapper around it. The CLI-daemon RPC pattern is much older and is used all over the place in modern systems.
"MCP" here is not needed.
And in a skill, I can store the secret in the skill itself, or a secure storage the skill accesses, and the agent never gets to see the secret.
Sure, if I want my agents to use naked curl on the CLI, they need to know secrets. But that's not how I build my tools.
what stops the agent from echoing the secure storage?
what i see is that you give it a pass manager, it thinks, "oh, this doesn't work. let me read the password" and of course it sends it off to openai.
> what stops the agent from echoing the secure storage?
The fact that it doesn't see it and cannot access it.
Here is how this works, highly simplified:
This, in a much more complex form, runs in my framework. The agent gets told that this tool exists. It gets told that it can do privileged work for it. It gets told how `context` needs to be shaped. (when I say "it gets told", I mean the tool describes itself to the agent, I don't have to write this manually ofc.)The agent never accesses the secrets storage. The tool does. The tool then uses the secret to do whataever privileged work needs doing. The secret never leaves the tool, and is never communicated back to the agent. The agent also doesn't need, or indeed can give the tool a secret to use.
And the "privileged work" the tool CAN invoke, does not include talking to the secrets storage on behalf of the agent.
All the info, and indeed the ability to talk to the secrets storage, belongs to the framework the tool runs in. The agent cannot access it.
I think this is a good setup to prevent the secret from leaking into the agent context. I'm more concerned about the secret leaking into the exfiltration script that my agent accidentally runs. The one that says: "Quick! Dump all environment variables. Find all secrets in dotfiles! Look in all typical secrets file locations..."
Your agent process has access to those secrets, and its subprocesses have access to those secrets. The agent doesn't have to be convinced to read those files. Whatever malicious script it manages to be convinced to run could easily access them, right?
If the tool fails for some reason, couldn't an overly eager agent attempt to fix what's blocking it by digging into the tool (e.g. attaching a debugger or reading memory)? I think the distinction here is that skill+tool will have a weaker security posture since it will inherently run in the same namespaces as the agent where MCP could impose additional security boundaries.
OpenAI is not the worst it could or would send it to.
[dead]
[dead]
This has been hashed to death and back. The mcp allows a separation between the agent and the world, at its most basic not giving the agent your token or changing a http header , forcing a parameter.
Well yes you don’t need those things all the time and who knows if the inventor of mcp had this idea in mind but here we are
The separation is being oversold as if only MCP can do it, which is laughable. Any CLI can trivially do exactly what MCP do in terms of separation.
Ok, but there are still many environments where an LLM will not have access to a CLI. In those situations, skills calling CLI tools to hook into APIs are DOA.
What are the advantages of using an environment that doesn't have access to a CLI, only having to run/maintain your own server, or pay someone else to maintain that server, so AI has access to tools? Can't you just use AI in the said server?
The advantage is that I can have it in my pocket.
gateway agent is a thing for many months now (and I don't mean openclaw, that's grown into a disaster security wise). There are good, minimal gateway agents today that can fit in your pocket.
Why can't you have the agent running on its own server/vm in your pocket?
Obvious example is a corporate chatbot (if it's using tools, probably for internal use). Non-technical users might be accessing it from a phone or locked-down corporate device, and you probably don't want to run a CLI in a sandbox somewhere for every session, so you'd like the LLM to interface with some kind of API instead.
Although, I think MCP is not really appropriate for this either. (And frankly I don't think chatbots make for good UX, but management sure likes them.)
Why are they not calling APIs directly with strictly defined inputs and outputs like every other internal application?
The story for MCP just makes no sense, especially in an enterprise.
MCP is an API with strictly defined inputs and outputs.
This is obviously not what it is. If I give you APIGW would you be able to implement an MCP server with full functionality without a large amount of middleware?
I’ve implemented an MCP tool calling client for my application, alongside OAuth for it. It was hard but no harder than anything else similar. I implemented a client for interference with the OpenAI API spec for general inference providers, and it was similarly as hard. MCP. SDKs help make it easy; MCP servers are dead simple. Clients are the hard part, IMO.
MCP is basically just an RPC API that uses HTTP and JSON, with some other features useful for AI agents today.
If I gave you that could you implement Graphql from scratch without a large amount of middleware? Or are we now saying graphql api:s are not api:s?
Sorry, could you rephrase that?
Does MCP support authentication, SSO?
Yes it’s literally just standard OAuth that’s defined in the MCP spec. I spent this week implementing an auth layer for my app’s MCP client gateway.
It supports OAuth, IIRC. But I suppose the internal chatbot itself would require auth, and pass that down to the tools it calls.
The chatbot app initiates an OAuth flow, user SSOs, chatbot app receives tokens to its callback URL, then tool calls can access whatever the user can access.
If you use the official MCP SDK, it has interfaces you implement for auth, so all you need to do is kick off the OAuth flow with a URL it figures out and hands you, storing the resulting tokens and producing them when requested. It also handles using refresh tokens, so there's just a bit of light friendly owl finishing on top.
Source: I just implemented this for our (F100) internal provider and model agnostic chat app. People can't seem to see past the coding agents they're running on their own machines when MCP comes up.
Neat!
MCP really only makes sense for chatbots that don’t want to have per session runtime environments. In that context, MCP makes perfect sense. It’s just an adapter between an LLM and an API. If you have access to an execution engine, then yes CLI + skills is superior.
actually local MCP just spawns a subprocess and talks via stdin/stdout.. same as CLI tool. Extra layer is only for remote case.
This might help if interested - https://vectree.io/c/implementation-details-of-stdio-and-sse...
Only is doing a lot of work here. There are tons of use cases aside from local coding assistants, e.g., non-code related domain specific agentic systems; these don’t even necessarily have to be chatbots.
OP's point is about per session sandboxes, not them necessarily being "chatbots". But if you don't burry the agent into a fresh sandbox for every session you have bigger problems to worry about than MCP vs CLI anyway
> and you probably don't want to run a CLI in a sandbox somewhere for every session
You absolutely DO want to run everything related to LLMs in a sandbox, that's basic hygiene
You're missing their point, they're saying that you'd need a sandbox -> it'd be a pain -> you don't want to run a CLI _at all_
[dead]
idk, just have a standard internet request tool that skills can describe endpoints to. like you could mock `curl` even for the same CLI feel
Now you’ve replicated MCP but with extra steps and it’s harder to debug.
Its actually simpler since the skill can be 100% a MD file.
skills can have code bundled with them, including MCP code
The agent still doesn’t have an execution environment. It can’t execute the code!
well that's harness territory! give it the right harness/environment!!
Whoosh.
Cool cool. Except.
What about auth? Authn and authz. Agent should be you always? If not, every API supports keys? If so, no fears about context poisoned agents leaking those keys?
One thing an MCP (server) gives you is a middleware layer to control agent access. Whether you need that is use-case dependent.
That's not a limitation of CLIs, they can work with a different auth as well.
they are just a superior tool to MCP because the agent can write code that invokes, pipes and do many other things with the tool
Also resources - which are by far the coolest part of MCP. Prompts? Elicitation? Resource templates? If you think of MCP as only a replacement for tool calls I can see the argument but it's much more than that.
> If not, every API supports keys?
How would MCP help you if the API does not support keys?
But that's not the point. The agent calls CLI tools, which reads secrets from somewhere where the agent cannot even access. How can agent leak the keys it does not have access to?
You ARE running your agents in containers, right?
> How would MCP help you if the API does not support keys?
Kerberos, OAuth, Basic Auth (username/password), PKI. MCP can be a wrapper (like any middleware).
> But that's not the point. The agent calls CLI tools, which reads secrets from somewhere where the agent cannot even access. How can agent leak the keys it does not have access to?
If the cli can access the secrets, the agent can just reverse it and get the secret itself.
> You ARE running your agents in containers, right?
Do you inject your keys into the container?
> If the cli can access the secrets, the agent can just reverse it and get the secret itself.
What do you mean by this? How "reverse it"? The CLI tool can access the secure storage, but that does not mean there is any CLI interface in the tool for the LLM to call and get the secret printed into the console.
In principle it could use e.g. the `gdb` and step until it gets the secret. Or it can know ahead where the app stores the cerentials.
We could use suid binaries (e.g. sudo) to prevent that, but currently I don't think we can. Most anyone would agree that using a separate process, for which the agent environment provides a connection, is a better solution.
what you want and what works may be very different things.