I can't quite tell what's being compared there -- just looks like several different LLMs?
To be clear, I'm suggesting that any specific format for "skills.md" is a red herring, and all you need to do is provide the LLM with good clear documentation.
A useful comparison would be between: a) make a carefully organised .skills/ folder, b) put the same info anywhere and just link to it from your top-level doc, c) just dump everything directly in the top-level doc.
My guess is that it's probably a good idea to break stuff out into separate sections, to avoid polluting the context with stuff you don't need; but the specific way you do that very likely isn't important at all. So (a) and (b) would perform about the same.
Your skepticism is valid. Vercel ran a study where they said that skills underperform putting a docs index in AGENTS.md[0].
My guess is that the standardization is going to make its way into how the models are trained and Skills are eventually going to pull out ahead.
0: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
Agents add a docs index in context for skills, so this is an issue of finding that the current specific implementation of skills in Claude Code is suboptimal.
Their reasoning about it is also flawed. E.g. "No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present." - but this is exactly the case for skills too. The difference is just where in the context the information is, and how it is structured.
Having looked at their article, ironically I think the reason it works is that they likely force more information into context by giving the agent less information to work with:
Instead of having a description, which might convince the agent a given skill isn't relevant, their index is basically a list of vague filenames, forcing the agent to make a guess, and potentialy reading the wrong thing.
This is basically exactly what skills were added to avoid. But it will break if the description isn't precise enough. And it's perfectly possible that current tooling isn't aggressive enough about pruning detail that might tempt the agent to ignore relevant files.
The current tooling isn't aggressive enough in that it's not the first thing that the agent checks for when it is prompted, at least for claude code. Way more often than not, i remind the agent that the skill exists before it does anything. It's very rare that it will pick a skill unprompted. Which to me kind of defeats the purpose of skills, I mean if I have to tell the thing to go look somewhere, I'll just make any old document folder in any format and tell it to look there.
> If you want a clean comparison, I’d test three conditions under equal context budgets: (A) monolithic > AGENTS.md, (B) README index that links to docs, (C) skills with progressive disclosure. Measure task > success, latency, and doc‑fetch count across 10–20 repo tasks. My hunch: (B)≈(C) on quality, but (C) > wins on token efficiency when the index is strong. Also, format alone isn’t magic—skills that reference > real tools/assets via the backing MCP are qualitatively different from docs‑only skills, so I’d > separate those in the comparison. Have you seen any benchmarks that control for discovery overhead?