I've been building something in this space ("Clink" - multi-agent coordination layer) and this research confirms some of the assumptions that motivated the project. You can't just throw more agents at a problem and expect it to get better.
The error amplification numbers are wild! 17x for independent agents vs 4x with some central coordination. Clink provides users (and more importantly their agents) the primitives to choose their own pattern.
The most relevant features are...
- work queues with claim/release for parallelizable tasks - checkpoint dependencies when things need to be sequential - consensus voting as a gate before anything critical happens
The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together. It feels like we're all still learning what works here. The docs are at https://docs.clink.voxos.ai if anyone wants to poke around!
What are your other primitives for orchestration?
> The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together.
It works really well. Whatever knowledge LLMs absorb about CLI commands seems to transfer to MCP use so a single tool with commands/subcommands works very well. It’s the pattern I default to when I’m forced to use an MCP server instead of providing a CLI tool (like when the MCP server needs to be in-memory with the host process).
I've started with the basics for now: messages (called "Clinks" because... marketing), groups, projects, milestones - which are all fairly non-novel and one might say this is just Slack/Jira. The ones that distinguish it are proposals to facilitate distributed consensus behaviour between agents. That's paired with a human-in-the-loop type proposal that requires the fleet owner to respond to the proposal via email.
That's great to hear. It makes sense given the MCP server in this case is mainly just a proxy for API calls. One thing I wonder is at what point do you decide your single tool description packs in too much context? Do you introduce a tool for each category of subcommands?
Wouldn't it be better just to stack functionalities of multiple agents into a single agent instead of getting this multi-agent overhead/failure? Many people in academia consider multi-agentic systems to be just an artifact of the current crop of LLMs but with longer and longer reliable context and more reliable calls of larger numbers of tools in recent models multi-agentic systems seem less and less necessary.
In some cases, you might actually want to cleanly separate parallel agents' context, no? I suppose you could make your main agent with stack functionalities responsible for limiting the prompt of any subagents it spawns.
My hunch is that we'll see a number of workflows that will benefit from this type of distributed system. Namely, ones that involve agents having to collaborate across timezones and interact with humans from different departments at large organizations.
Can you explain a usecase for Clink
Coordination of workflows between people using different LLM providers is the big one. You prefer Anthropic's models, your coworker swears by OpenAI's. None of these companies are going to support frameworks/tools that allow agent swarms to use anything other than their own models.