> One of the most useful things about AI is also one of the most humbling: it reveals how clear your own judgment actually is. If your critique stays vague, your taste is still underdeveloped. If your critique becomes precise, your judgment is stronger than the model output. You can then use the model well instead of being led by it.
Something I find that teams get wrong with agentic coding: they start by reverse engineering docs from an existing codebase.This is a mistake.
Instead, the right train of thought is: "what would perfect code look like?" and then meticulously describe to the LLM what "perfect" is to shape every line that gets generated.
This exercise is hard for some folks to grasp because they've never thought much about what well-constructed code or architectures looks like; they have no "taste" and thus no ability to precisely dictate the framework for "perfect" (yes, there is some subjectivity that reflects taste).
> Instead, the right train of thought is: "what would perfect code look like?" and then meticulously describe to the LLM what "perfect" is to shape every line that gets generated.
I think this goes against what a lot of developers want AI to be (not me, to be clear).
Also a lot of middle managers. Many organizations enthusiastically adopting AI are doing so because they want to appeal to the authority of the bots and bludgeon colleagues with it.
I'm looking at it from a team perspective.
With the right docs, I can lift every developer of every skill level up to a minimum "floor" and influence every line of code that gets committed to move it closer to "perfect".
I'm not writing every prompt so there is still some variation, but this approach has given us very high quality PRs with very minimal overhead by getting the initial generation passes as close to "perfect" as reasonably possible.
Oh I agree with you, I'm just saying a lot of developers don't want to use it like that. AI has liberated them from the drudgery of reading and writing code and they won't accept that they should still be doing a bit of both, if not a lot of reading.
It does amaze me when colleagues refuse to read what I (personally, deliberately) wrote (they ask AI to summarize), but then tell AI to write their response and it's absolutely bloated and full of misconceptions around my original document.
If they aren't willing to read what I put effort into, why should I be expected to read the ill-conceived and verbose response? I really don't want to get into a match of my AI arguing with your AI, but that's what they've told me I should be doing...
I've been having ongoing issues with a manager who responds in the form of Claude guided PRs. Undoubtedly driven from confused prompts. Always full of issues, never actually solving the problem, always adding HEAPS of additional nonsense in the process.
There's an asymmetry of effort in the above, and when combined with the power asymmetry - that's a really bad combo, and I don't think I'm alone.
I'm glad to see the appreciation of the enormous costs of complexity on this forum, but I don't think that has ascended to the managerial level.
In my current role, I have shifted from lead IC to building the system that is used by other IC's and non-IC's.
From my perspective, if I can provide the right guardrails to the agent, then anyone using any agent will produce code that is going to coalesce around a higher baseline of quality. Most of my IC work now is aligned on this directionality.
Ya, I can't stand that. Asking a question and being hit with "this is what claude said" gives me a new kind of rage.
Yeah, this happened to me recently and the advice could have caused data corruption (yay old systems). I only caught it because they asked before making changes and I had a vague memory of it from having investigated the same thing almost a decade ago (and found the note and explanation with a link to a bugtracker in my personal wiki).
It doesn't matter, one way or the other. The overall market share will decide. In some cases, I think good code will be a decisive factor. Think Steam launcher Vs Epic. Epic doesn't have good code. Their performance suffers in consequence. In other cases the users are so trapped it makes no difference. MS Outlook and Teams is the prime example of this.
This matches my experience exactly. I built a tool that sends code to three different AI models for review because the model that wrote the code can't critique it honestly. It has all the context and actively suppresses objections. The second model, with zero context, immediately finds things the first one rationalized away. Taste isn't just knowing what good looks like, it's being willing to say "this isn't it" to your own work. AI can't do that to itself yet.
I don't think you really need a tool for that, you can just add something like "after the task is finished, have a subagent review the work in an adversarial fashion. If any defects, no matter how small are found, have another subagent implement the findings. Repeat this in a loop until all subagents achieve consensus that the product is of exceptional quality with no defects" or similar to each prompt. Each subagent gets its own, fresh, context window. No tooling required.
I've worked in too many large codebases where no one can point to any _single file or class_ and label it "correct," ("the right way") yet management is amazed when the lack of a "North Star" means the codebase is full of overlapping, piecemeal patterns that are lucky to work together at all.
That's why the team needs someone with "taste" to dictate the idiomatic way to do it and why LLMs (when used this way) can raise the floor of quality and baseline of consistency.
> Instead, the right train of thought is: "what would perfect code look like?"
That's the classic 2nd-system effect - "let's rewrite it from scratch, now that we know what we want". And you throw away all the hard-learned lessons.
https://en.wikipedia.org/wiki/Second-system_effect
Not really the case; you're misunderstanding the term second system effect.
It's the exact opposite: by explicitly dictating what is correct, perfect, and standard in this codebase, we achieve very high consistency and quality with very little "embellishment" and excess because the LLM is following a set of highly curated instructions rather than the whims of each developer on the team.Suggest that you re-read what Brooks meant by "second system effect".
> Instead, the right train of thought is: "what would perfect code look like?" and then meticulously describe to the LLM what "perfect" is to shape every line that gets generated.
I don't think there's perfect code.
Code is automation - it automates human effort and humans themselves have error, hence not perfect.
So as long as code meets or exceeds the human output, it's "good enough" and meets expectations. That's what a typical customer cares about.
A customer will happily choose a tent made of tarp and plastic sticks that's available at their budget, right now when it's raining outside, over an architectural marvel that will be available sometime in the future at some unknown pricepoint.
Put another way, I don't think if you built CharlieAI CharlieGPT today, where the only differentiating factor over ChatGPT was that CharlieGPT was written using perfect code, you would have any meaningful edge.
I am yet to see any evidence where everything else being equal, one company had an edge over another simply due to superior code.
Infact, I have overwhelming evidence of companies that had better code succumb and vanish against companies that had very little, if any code, because those dollars were instead invested in better customer discovery, segmentation and analytics ("what should we build?", "if we did one thing that would give our customers an unfair advantage, what would be that thing?")
Software history is full of perfect OS, editors, frameworks, protocols that is lost over time because a provably inferior option won marketshare.
You are using a software controlled SMPS to power your device right now. You have 0 idea what the quality of that code is. All you care about is whether that SMPS drains your battery prematurely and heats up your device unnecessarily. It's extremely unlikely that such an efficient, low overhead control system was written using well abstracted modules. It's more likely that control system is full of gotos and repeated violations of DRY that would make a perfectionist shudder and cry.
You only need to accurately describe what "perfect" is to the LLM instead of allowing it to regress to the mean of its training set. There really is no cost difference between writing shitty code and "perfect" code now; its just a matter of how good you are at describing "perfect" to the LLM.
For example, we very specifically want our agents to write code using C# tuple return types for private methods that return more than 1 value instead of creating a class. The tuple return type is a stack allocated value type and has a default deconstructor. We also always want to use named tuple fields every time because it removes ambiguity for humans and increases efficiency for agents when re-reading the code.
We want the code to make use of pattern matching and switch expressions (not `switch-case`) because they help enforce exhaustive checks at compile time and make the code more terse.
If we simply tell the agent these rules ahead of time, we get "perfect", consistent code each time. Being able to do so requires "taste" and understanding why writing code one way or using a specific language construct or a specific design pattern is the "right" way.
> There really is no cost difference between writing shitty code and "perfect" code now; its just a matter of how good you are at describing "perfect" to the LLM.
The consequent is at odds with the antecedent. It's a performative contradiction (if the output were truly "free", the skill of the operator would be a zero-value variable - yet, by requiring skill, you acknowledge a cost) as I prove below
> The cost of "perfect" is only perhaps a few fractions of a cent higher than shitty.
Is your cost model accounting for the cost of specification, of review and additional cycles required if review fails or the specification itself needs to be adjusted?
> If we simply tell the agent these rules ahead of time, we get "perfect", consistent code each time
No, in the simplest case, your cost of perfection is simply moving up the chain of abstraction from implementation (coding) to design and specification. In reality it also splits and moves a part of that cost downstream to verification.
This isn't some special, magical insight I have, I'm reiterating Tesler's Law right back to you.
I also encourage you to read software history - for decades it has been trivial to split out perfectly working CRUD from an ER and UML diagram, no LLM necessary. The insight is understanding why we continue to hire cheap human labor to spit out CRUD instead of using those tools.
The cost of software is, and always has been, in the figuring out the intent, not the generation of syntax.
I wish pg was more active on HN - I expect this is one of the reasons why he wanted founders to have and share the painpoints of their (potential) customers. Figuring out the intent is expensive. Mistake the intent and the best case scenario is a pivot.
[flagged]