I think people are grouping into two flows.
One group is trying to get the LLM to basically one shot everything and not properly reviewing the output.
Others are using the LLM to assist their human intelligence in a tight loop.
If you’re doing the former you really do need the best model available because that’s still right on the edge of what LLMs can do at best, and at worst you’re just shipping pure unmaintainable slop.
If you’re doing the latter then you can get away with a slightly less powerful model without it making a material difference because your human intelligence is filling in gaps
The later takes too much mental ressources, the same when reviewing truly the code generated by the former.
I generally started by reviewing but after a while (maximum in hours), I just can't keep up and resort to LLMs as sole reviewers.
not many want to admit this
Well put. I belong to the latter group as I feed small, granular tasks that I describe thoroughly to the LLM. I tried, however, to just give it a bigger scope task. Even best models produce sloppy code.
While the single functions/classes/structs/... can be well though out the code tends to lack cohesion, and especially maintainability. For instance, it never thinks: "I could put this logic in an interface/trait so that if the requirements change I can simply add a concrete implementation that satisfies the new requirements (and potentially use one of these for testing)".
Yes that's also my experience.
SoTA models can do reasonably good jobs on each ticket, but over time the architecture of the application starts degrading without a human in the loop.
The entropy increases slower with better models but the trend is always towards slop