Easy, have Claude review the code, tell it to be critical and that it needs to be easier to understand, follow Clean Code, SOLID principles and best practices. Lie to it, say you got this from a Junior developer, or "review it as if you were a Staff Level Engineer reviewing Junior code" the models can write better code, just nobody tells them to.

Lol, the only thing worse than a junior developer following Clean Code and SOLID has to be an LLM messing with code so it looks like it follows.

Clean Code has its really "meh" areas, but the core idea and spirit of it is sound, heck Python's best guide is PEP-8 if you follow that, it forces you to write much better Python code.

In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.

Code review is the main thing I use LLMs for. I have found it to be remarkably candid when you tell it the code came from another LLM (even name it). I was running Kimi K2.6 Q4 locally, seeing if it could SIMD a bit-matrix transpose function, and it was slow enough that I would paste its thinking into Gemini every few minutes. Gemini was savage.

> Gemini was savage.

Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.

This is it. I've had a similar experience in just playing around I asked it to clean up some code it wrote to increase maintainability and readability by humans. After a few iterations it had generated quite solid code. It also broke the code a couple of times along the way. But it does get me thinking that these pipelines with agents doing specific tasks makes a lot of sense. One to design and architect, one to implement, one to clean, one to review, one to test (actually there's probably a bunch of different agents for testing -- testing perf/power, that it matches the requirements/spec, matches the design, is readable/maintainable, etc...).

I built GuardRails after some frustrations with Beads which I love, and this whole exchange made me realize, because I have "gates" after tasks, I could add a "Review the code" type of gate, and probably get insanely better output, I already get reasonably good output because I spec out the requirements beforehand, that's the other thing, if you can tell the LLM HOW to build before it does, you will have better output.

I've had success with this approach also. You do feel empowered, 10x or whatever: but then you're looking at more projects, context switching a lot more, and it can burn you out.

Why wouldn't Claude just impose this same loop in the code it writes - or better, write better code before it needs such review?

Because language models don’t think before doing, they think by doing.

Maybe a more idealized training set could improve things, but at least for today’s SOTA, you have to get the shitty first draft out and then improve it.

Harnessing makes a difference, but it’s only shuffling around when and where the tokens get generated. It can trade being slower by doing a hidden first draft and only showing the output after doing a self review. But the models still need to generate it all explicitly.

Why would it? It doesn't do anything with intention without being prompted. When you ask it to do something it's going to give you what seems like the most likely result, it isn't striving to give you the most correct result, those things just have some overlap.

I assume it would involve wasting a lot more tokens reasoning about this. It is known that GPT uses less tokens than Claude, but Claude uses them to reason about problems more, which is part of its "secret sauce" and why so many swear by Claude Code.

Even better, if you have access to multiple models, tell it you got the code from another AI agent.

I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"

If it's within its context window, it will know you're lying, so either compact or start a new chat (don't do this on Claude, it dings your usage, always has).

Is this a joke? Smartest people on the planet never thought about telling AI to just write better code?

Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default? I wouldn't accept a calculator application that got math wrong unless you pressed a button labeled "actually solve the problem."

> Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default?

It's not the default, because the training data is full of unmaintainable code done wrong with mistakes. People literally complain that LLMs write too many tests or add comments.

If instead of "do it right", you give it specific actionable advice of how to right code, it does surprisingly well. Newer frontier models also do a great job of mimicking the style and rigor of the surrounding codebase without prompting, if you're working in an established codebase, for better or worse.

The default isn't necessarily what ever you consider maintianable or do it right, which are ambiguous terms anyway.

You never wrote quick exploratory code? One off scripts? How is the Ai suppsed to know unless you tell it.

If you tell another person to write some code, how are they suppsed to know? If you have your boss come to you and ask you to write some code to do some data analysis are you going to spend weeks writing units tests and perfect abstractions? Or do it quick and get the data and result?

You forget that this all takes tokens from the model, so it has to be very stingy and whatever it comes up with "first" is what it goes with. I've seen people do the same as me, tell the model NOT TO GUESS but to do research first, which yields better output and saves time. Models today are better when they review the context directly, the focus shifted from it knowing everything in its training data to being able to dynamically learn new things and use that information in a meaningful way.

For example, I built up a programming language from scratch with Claude, it knows nuances about my languages syntax, and can write code in my language effectively. I did it mostly as a test. It definitely helped that my language is heavily mostly Python based.

[dead]