It's so nice that skills are becoming a standard, they are imo a much bigger deal long-term than e.g. MCP.
Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).
Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).
Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.
Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.
On top of everything you've described, one more advantage is that you can use the agents themselves to edit / improve / add to the skills. One easy one to do is something like "take the key points from this session and add the learnings as a skill". It works both on good sessions with new paths/functionality and on "bad" sessions where you had to hand-hold the agent. And they're pretty good at summarising and extracting tidbits. And you can always skim the files and do quick edits.
Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.
I think taking key points from a session and making a new skill is less useful than "precaching" by disseminating the key findings and updating related or affected skills, eliminating the need for a new skill (in most cases).
On the other hand, from a pure functional coding appeal, new skills that don't have leaking roles can be more atomic and efficient in the long run. Both have their pros/cons.
Add reinforcement learning to figure out which skills are actually useful, and you're really cooking.
DSPy with GEPA should work nicely, yeah. Haven't tried yet but I'll add it to my list. I think a way to share within teams is also low-hanging fruit in this space (outside of just adding them to the repo). Something more org-generic.
> DSPy with GEPA should work nicely
I think that would be a really really interesting thing to do on a bunch of different tasks involving developer tooling (e.g. git, jj, linters, etc.)
Combine that with retrying the same task again but with the improved skills in some sort of train loop that learns to bake in the skills natively to obviate the need for them.
The path to recursive self-improvement seems to be emerging.
Perhaps you could help me.
I'm having a hard time figuring out how could I leverage skills in a medium size web application project.
It's python, PostgreSQL, Django.
Thanks in advance.
I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.
There’s nothing super special about it, it’s just handy if you have some instructions that you don’t need the AI to see all the time, but that you’d like it to have available for specific things.
Maybe you have a custom auth backend that needs an annoying local proxy setup before it can be tested—you don’t need all of those instructions in the primary agents.md bloating the context on every request, a skill would let you separate them so they’re only accessed when needed.
Or if you have a complex testing setup and a multi-step process for generating realistic fixtures and mocks: the AI maybe only needs some basic instructions on how to run the tests 90% of the time, but when it’s time to make significant changes it needs info about your whole workflow and philosophy.
I have a django project with some hardcoded constants that I source from various third party sites, which need to be updated periodically. Originally that meant sitting down and visiting a few websites and copy pasting identifiers from them. As AI got better web search I was able to put together a prompt that did pretty well at compiling them. With a skill I can have the AI find the updated info, update the code itself, and provide it some little test scripts to validate it did everything right.
Thanks. I think I could use skills as "instructions I might need but I don't want to clutter AGENTS.md with them".
Yes exactly. Skills are just sub agents.md files + an index. The index tells the agent about the content of the .md files and when to use them. Just a short paragraph per file, so it's token efficient and doesn't take much of your context.
Poor man's "skills" is just manually managing and adding different .md files to the context.
Importantly every time you instruct the agent to do something correctly that it did incorrectly before, you ask it to revise a relevant .md file/"skill", so it has that correction from now on. This is how you slowly build up relevant skills. Things start out as sections in your agents.md file, and then graduate to a separate file when they get large enough.
Yes but also because skills are a semi special construct, agents are both better at leveraging them when needed and you can easily tap into them explicitly (eg “use the PR skill to open a PR”)
you could for example create a skill to access your database for testing purposes and pass in your tables specifications so that the agent can easily retrieve data for you on the fly.
I made a small mcp script for database with 3 tools:
- listTables
- getTableSchema
- executeQuery (blocks destructive queries like anything containing DROP, DELETE, etc..)
I wouldn't trust a textual instructions to prevent LLMs from dropping a table.
That's why I give the LLM a readonly connection
This is much better than MCP, which also stuffs every session's precious context with potentially irrelevant instructions.
They could just make mcps dynamically loaded in the same way no?
It is still worse as it consumes more context giving instructions for custom tooling whereas the LLM already understands how to connect to and query a read-only SQL service with standard tools
Oooooo, woah, I didn't really "get it" thanks for spelling it out a bit, just thought of some crazy cool experiments I can run if that is true.
it’s also for (typically) longer context you don’t always want the agent to have in its context. if you always want it in context, use rules (memories)
but if it’s something more involved or less frequently used (perhaps some debugging methodology, or designing new data schemas) skills are probably a good fit
Skills are not useful for single-shot cases. They are for: cross-team standardization (for LLM generated code), and reliable reusability of existing code/learnings.
Skills are the matrix scene where neo learns kungfu. Imagine they are a database of specialized knowledge that can an agent can instantly tap into _on demand_.
The key here is “on demand”. Not every agent or convention needs to know kung fu. But when they do, a skill is waiting to be consumed. This basic idea is “progressive disclosure” and it composes nicely to keep context windows focused. Eg i have a metabase skill to query analytics. Within that I conditionally refer to how to generate authentication if they arent authenticated. If they are authenticated, that information need not be consumed.
Some practical “skills”: writing tests, fetching sentry info, using playwright (a lot of local mcps are just flat out replaced by skills), submitting a PR according to team conventions (eg run lint, review code for X, title matches format, etc)
Could you explain more about your metabase skill and how you use it? We use metabase (and generally love it) and I’m interested to hear about how other people are using it!
Its really just some rules around auth, some precached lookups (eg databases with ids and which to use), and some explanations around models and where to find them. Everything else it pretty much knows on it own.
Nice analogy!
I cant claim credit. Im pretty sure Ive seen anthropic themselves use it in the original explainers
There can be a Django template skill for example, which is just a markdown file which reminds the LLM the syntax of Django Templates and best practices for it. It could have an included script that the LLM can use to test a single template file for example.
So a skill is effectively use case / user story / workflow recipe caching