This article doesn't address writing code with AI, just code review. My issue with agentic coding is that I make numerous micro-architectural decisions while programming. I almost never have a full spec up front and develop one as I consider what I am writing.
When using Claude Code or Codex, that is all gone. Claude Code is extremely eager to reach the end goal to the point that it feels like a fever dream to write code with it. In the end, I have low confidence about edge cases and fit into the project's architectural and design goals.
On top of that, I enjoy programming, reverse engineering, etc. and I feel that the LLMs, while able to solve some problems or deliver some features, take that fun away. I'm trying really hard to find a workflow with them that I'm confident in, but I fear that workflow is just chat, search, and being a rubber duck for my thoughts.
> This article doesn't address writing code with AI, just code review. My issue with agentic coding is that I make numerous micro-architectural decisions while programming. I almost never have a full spec up front and develop one as I consider what I am writing.
working with AI forced me to write better specs but the way I write today is very different. I typically open Codex and have Linear MCP connected where my chat with the AI will end up writing the issue. Its a lot of back-end-forth where I tell what I want, the AI does all the code scanning, write something, I correct something, etc
The value for me is exactly that I tell what I want, the AI verify in the actual code if that's the path that makes more sense or not. In the end I have a pretty detailed spec that I'm much more confident is the correct path.
I find the spec easier to review than a huge PR so typically when executing is much faster and aligned with what I want.
The grill-me skill from Matt Pocock is great for this (https://github.com/mattpocock/skills/blob/main/skills/produc...)
> but I fear that workflow is just chat, search, and being a rubber duck for my thoughts.
That's still a lot of benefit, though. I have to agree with Patrick McKenzie on this one (https://x.com/patio11/status/2058631943785488815):
> If the only impact of LLMs professionally was causing people to "think out loud" in a way which was routinely captured by computer systems and then could be operated on by computer systems, that would by itself be one of the most consequential changes in practice in 100 years
> I fear that workflow is just chat, search, and being a rubber duck for my thoughts
This is exactly what I settled upon after my own trying really hard. It is liberating, I have no fear at all!
> On top of that, I enjoy programming, reverse engineering, etc. and I feel that the LLMs, while able to solve some problems or deliver some features, take that fun away.
Same, I prefer asking one or multiple very technical questions to Gemini, analyze, compare and understand the responses then implement it myself based on what I learned (or just integrate it to the codebase as it is, if I asked it to write a function) than delegating away all the fun to an agent.
I find using the LLM to generate different git repo skeletons for the same class of project using the 4-5 different programming languages I’m familiar with is really interesting and helpful. Then I ask it to explicitly describe its design decisions for different parts of the small codebase, i.e. what do the internal APIs look like, so that if you make changes in one section of the codebase, you can be sure you don’t accidentally generate problems in another section of the codebase. Only once you’ve worked out all such constraints, clarified dependencies, etc. do you start generating code in each subsection and that’s done using the specific constraints for that section in each prompt, and reviewing all the code. This is also when you generate the tests for each subsection. Finally this is where using a different LLM(s) for code review after the code is written becomes important. It’s a slow process certainly but it seems to work pretty well.
A lot of programming work is well represented in the training data. For that kind of stuff there’s not much to do regarding architectural decisions. I love to run the LLMs on auto for that work. But for anything not well represented in the training data, which could be anything from mundane stuff in PyQT or a truly novel application, keep them on a short leash or forget them altogether.
> represented in the training data
This isn’t a binary is/isn’t thing though. What if only 80% of my task is, how would I know that the other part isn’t, if I haven’t worked it through fully
What if my task is generally represented, but for my specific context, there are specific details that aren’t?
How would I know until I’ve reasoned through it myself? At that point having the LLM do the work doesn’t add much value
[flagged]