I am not doing any of this.
It becomes obsolete in literally weeks, and it also doesn't work 80% of the time. Like why write a mcp server for custom tasks when I don't know if the llm is going to reliably call it.
My rule for AI has been steadfast for months (years?) now. I write (myself, not AI because then I spend more time guiding the AI instead of thinking about the problem) documentation for myself (templates, checklist, etc.). I give ai a chance to one-shot it in seconds, if it can't, I am either review my documentation or I just do it manually.
A perspective which has helped me is viewing LLM-based offerings strictly as statistical document generators, whose usefulness is entirely dependent upon their training data set plus model evolution, and whose usage is best modeled as a form of constraint programming[0] lacking a formal (repeatable) grammar. As such, and when considering the subjectivity of natural languages in general, the best I hope for when using them are quick iterations consisting of refining constraint sentence fragments.
Here is a simple example which took 4 iterations using Gemini to get a result requiring no manual changes:
EDIT:For reference, a hand-written script satisfying the above (excluding comments for brevity) could look like:
0 - https://en.wikipedia.org/wiki/Constraint_programmingDoesn't look like your handwritten example follows the direction on commenting.
(While we're at it, there's no need for an apostrophe when pluralising an initialism, like "URLs".)
One hack is to end the prompt with: following solid architecture principles.
EG: Step 1: Define problem in PROBLEM.md Step 2: Ask agent to gather scope from codebase and update PROBLEM.md Step 3: Ask agent to create a plan following design and architecture best practices (solid, etc) and update PROBLEM.md Step 4: Ask agent to implement PROBLEM.md
Do you get tangibly different results if you don't capitalize MUST (NOT)?
The ability of newer agents to develop plans that been be reviewed and most importantly do a build test modify cycle has really helped. You can task an agent with some junior programmer task and then go off and do something else.
This is literally it.
High upside (if the AI manages to complete the task it is a time save), relatively low downside (not getting stuck in these AI feedback loops that are ultimately a time waste).
Seems odd to me that people spend so much time promoting some of the least productive aspects of AI tooling.
An alternative is to view the AI agent as a new developer on your team. If existing guidance + one-shot doesn't work, revisit the documentation and guidance (ie dotMD file), see what's missing, improve it, and try again. Like telling a new engineer "actually, here is how we do this thing". The engineer learns and next time gets it right.
I don't do MCPs much because of effort and security risks. But I find the loop above really effective. The alternative (one-shot or ignore) would be like hiring someone, then if they get it wrong, telling them "I'll do it myself" (or firing them)... But to each his own (and yes, AI are not human).
I don't think you can say it learns - and that is part of the issue. Time mentoring a new colleague is well spent making the colleague grow professionally.
Time hand-holding an AI agent is wasted when all you guidance inevitably falls out of the context window and it start making the same mistakes again.
That's why you put it in either code documentation or context files (like dotMD).
> The engineer learns and next time gets it right.
Antropomorphizing LLMs like that is the path to madness. That's where all the frustration comes from.
On the contrary; stubborn refusal to anthropomorphize LLMs is where the frustration comes from. To a first approximation, the models are like little people on a chip; the success and failure modes are the same as with talking to people.
If you look, all the good advice and guidelines for LLMs are effectively the same as for human employees - clarity of communication, sufficient context, not distracting with bullshit, information hygiene, managing trust. There are deep reasons for that, and as a rule of thumb, treating LLMs like naive savants gives reliable intuitions for what works, and what doesn't.
I treat LLMs as statistics driven compression of knowledge and problem solving patterns.
If you treat it as such it is all understandable where they might fail and where you might have to guide them.
Also treat it as something that during training has been biased to produce immediate impressive results. This is why it bundles everything into single files, try catch patterns where catch will return mock data to show impressive one shot demo.
So the above you have to actively fight against, to make them prioritise scalability of the codebase and solutions.
Exactly this. People treat LLMs like they treat machines and then are surprised that "LLMs are bad".
The right mental model for working with LLMs is much closer to "person" than to "machine".
I agree. Software development is on an ascent to a new plateau. We have not reached that yet. Any skill that is built up now is at best built on a slope.
I've found that if it can't get it right within a few shot iteration - it's generally better to switch to writing with auto-complete which is still quite quick compared to the days of old.
I think both are helpful
1. starting fresh, because of context poisoning / long-term attention issues
2. lots of tools makes the job easier, if you give them a tool discovery tool, (based on Anthropics recent post)
We don't have reliable ways to evaluate all the prompts and related tweaking. I'm working towards this with my agentic setup. Added time travel for sessions based on Dagger yesterday, with forking, cloning, registry probably toda