Building your AI agent "toolkit" is becoming the equivalent of the perfect "productivity" setup where you spend your time reading blog posts, watching YouTube videos telling you how to be productive and creating habits and rituals...only to be overtaken by a person with a simple paper list of tasks that they work through.
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
All I want is for my agent to save me time, and to become a _compounding_ multiplier for my output. As a PM, I mostly want to use it for demos and prototypes and ideation. And I need it to work with my fractured attention span and saturated meeting schedule, so compounding is critical.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
Let me give you a counterexample. I'm working on a product for the national market, and i need to do all financial tasks, invoicing, submit to national fiscal databse etc. through a local accounting firm. So i integrate their API in the backend; this is a 100% custom API developed by this small european firm, with a few dozen restful enpoints supporting various accounting operations, and I need to use it programmatically to maintain sync for legal compliance. No LLM ever heard of it. It has a few hundred KB of HTML documentation that Claude can ingest perfectly fine and generate a curl command for, but i don't want to blow my token use and context on every interaction.
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
Your use is maybe more vanilla than you think. I think you are just getting shit done. Which is good.
Claude and an mcp and skill is plain to me. Writing your own agent connecting to LLMs to try to be better than Claude code, using Ralph loops and so on is the rabbit hole.
What exactly does it do that a professional would charge you thousands for?
(I'm genuinely asking)
its not though if you're working in a massive codebase or on a distributed system that has many interconnected parts.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
But... plain Claude does that. At least for my codebase, which is nowhere close to your 10m line. But we do processing on lots of data (~100TB) and Claude definitely builds one-off tools and scripts to analyze it, which works pretty great in my experience.
What sort of skills are you referring to?
If you build up and save some of those scripts, skills help Claude remember how and when to use them.
Skills are crazy useful to tell Claude how to debug your particular project, especially when you have a library of useful scripts for doing so.
I think people are looking at skills the wrong way. It's not like it gives it some kind of superpowers it couldn't do otherwise. Ideally you'll have Claude write the skills anyway. It's just a shortcut so you don't have to keep rewriting a prompt all over again and/or have Claude keep figuring out how to do the same thing repeatedly. You can save lots of time, tokens and manual guidance by having well thought skills. Some people use these to "larp" some kind of different job roles etc and I don't think that's productive use of skills unless the prompts are truly exceptional.
This resonates with me. Sometimes I build up some artifacts within the context of a task, but these almost always get thrown away. There are primarily three reason I prefer a vanilla setup.
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
[flagged]
Lots of money being made by luring people into this trap.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
Or, as I like to put it: I need to activate my personal transformers on my inner embeddings space to figure what is it I really want. And still, quite often, I think in terms of the programming language I'm used to and the library I'm familiar with.
So, to really create something new that I care about, LLMs don't help much.
They are still useful for plenty of other tasks.
> Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
This. At work I have described this phenomenon as the equivalent of tinkering with the margins and fonts in your word processor instead of just writing your paper.
Emacs init file bikeshedding comes to mind…
but now you can build your AI agent toolkit to work on your init file for you
My init.el file went from some 300 lines to under 50 with Claude's assistance. Some of that had to do with updating Emacs, but I really only use Emacs for Org mode so that contribution was minimal.
if you work on platforms, frameworks, tools that are public knowledge, then yeah. If there’s nothing unique to your project or how to write code in it, build it, deploy it, operate it, yeah.
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
example https://news.ycombinator.com/item?id=47501214