> That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?

I have mostly moved away from this sort of fine-tuning approach because of experience a while ago around OpenAI's ChatGPT 3.5 and 4. Extra work on my end necessary with the older model wasn't with the new one, and sometimes counterintuitively caused worse performance by pointing it at what the way I'd do it vs the way it might have the best luck with. ESPECIALLY for the sycophantic models which will heavily index on "if you suggested that this thing might be related, I'll figure out some way to make sure it is!"

So more recently I generally stick to the "we'll handle a lot of the prompt nitty gritty" for you IDE or CLI agent stuff, but I find they still fall apart with large complex codebases and also that the tricks don't translate across codebases.

Yes and no. The broader business context translates well, but each model has it's own blindspots and hyperfocuses that you need to massage out.

* Business context - these are things like code quality/robustness, expected spec coverage, expected performance needs, domain specific knowledge. These generally translate well between models, but can vary between code bases. For example, a core monolith is going to have higher standards than a one-off auxiliary service.

* Model focuses - Different models have different tendencies when searching a code base and building up their context. These are specific to each code base, but relatively obvious when they happen. For example, in one code base I work in, one model always seems to pick up our legacy notification system while another model happens to find our new one. It's not really a skill issue. It's just luck of the draw how files are named and how each of them search. They each just find a "valid" notification pattern in a different order.

LLMs are massively helpful for orienting to a new codebase, but it just takes some time to work out those little kinks.

This is like UB in compilers but 100x worse, because there's no spec, it's not even documented, and it could change without a compiler update.

It is nothing at all like UB in a compiler. UB creates invisible bugs that tend to be discovered only after things have shipped. This is code generation. You can just read the code to see what it does, which is what most professionals using LLMs do.

With the volume of code people are generating, no you really can't just read it all. pg recently posted [1] that someone he knows is generating 10kloc/day now. There's no way people are using AI to generate that volume of code and reading it. How many invisible bugs are lurking in that code base, waiting to be found some time in the future after the code has shipped?

[1] https://x.com/paulg/status/1953289830982664236

I read every line I generate and usually adjust things; I'm uncomfortable merging a PR I haven't put my fingerprints on somehow. From the conversations I have with other practitioners, I think this is pretty normal. So, no, I reject your premise.

My premise didn't have anything to do with you, so what you do isn't a basis for rejecting it. No matter what you or your small group of peers do, AI is generating code at a volume that all the developers in the world combined couldn't read if they dedicated 24hrs/day.

[dead]