Shows how much room for improvement there is on the harness level.
Agents waste a lot of tokens on editing, sandboxes, passing info back and forth from tool calls and subagents.
Love the pragmatic mix of content based addressing + line numbers. Beautiful.
Indeed. The biggest waste might be the overuse of MCP for everything. Sure it makes the initial development easier but then for every connection you're using a hundred billion dollar parameter model to decide how to make the call when it's usually completely unnecessary and then prone to random errors. MCP is the hammer that can make literally everything look like a nail...
I see this ranting against MCP all the time, and I don't get it, maybe I'm missing something. I'm currently using an MCP in Cursor to give agents read-only access to my staging and prod databases, as well as BugSnag's MCP so it can look up errors that happen in those environments. It works great. What should I be using for this if not MCP?
Make a CLI tool for it, of course
What? Why? What advantage does that have over just using an MCP server that exposes tools to run queries?
Context.
Why would I use an MCP when I can use a cli tool that the model likely trained on how to use?
Can you be more specific about “context”?
And not everything has a CLI, but in any case, the comment I was replying to was suggesting building my own CLI, which presumably the LLM wasn’t trained on.
Maybe my understanding of MCP is wrong, my assumption is that it’s a combination of a set of documented tools that the LLM can call (which return structured output), and a server that actually receives and processes those tool calls. Is that not right? What’s the downside?
agent skills, or use claude code to iteratively condense an MCP you want to use into only its most essential tools for your workflow
Agent skills are just a markdown file, what’s in that markdown file in your scenario?
And the MCP already only has the most essential tools for my workflow: the ability to run queries against a few databases.
i haven't dug into the article but your comment reminded me about the ClaudeCode Superpowers plugin. I find the plugin great but it's quite "expensive", I use the pay-as-you-go account with CC because i've just been trying it out personally and the superpowers plugin spends a lot of money, relative to regular CC, with all the back and forth.
With CC you can do a /cost to see how much your session cost in dollar terms, that's a good benchmark IMO for plugins, .md files for agents, and so on. Minimize the LLM cost in the way you'd minimize typical resource usage on a computer like cpu, ram, storage etc.
you can actually go the other way and spend more tokens to solve more complex problems (multi-agent) by letting agents work with smaller problems