I am getting disproportionately good results with the models by following a process: spec -> plan -> critique -> improve plan -> implement plan.

If I may "yes, and" this: spec → plan → critique → improve plan → implement plan → code review

It may sound absurd to review an implementation with the same model you used to write it, but it works extremely well. You can optionally crank the "effort" knob (if your model has one) to "max" for the code review.

[deleted]

A blanket follow-up "are you sure this is the best way to do it?"

Frequently returns, "Oh, you are absolutely correct, let me redo this part better."

You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.

At the end of the day it’s an autocomplete. So if you ask “are you sure?” then “oh, actually” is a statistically likely completion.

> You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.

I'm just a sample size of one, but FWIW I didn't find that this noticably improved my results.

Not having to completely recreate all the LLM context neccessary to understand the literal context and the spectrum of possible solutions (which the LLM still "knows" before you clear the session) saves lots of time and tokens.

Interesting, I definitely see better results on a clean session. On a “dirty” session it’s more likely to go with “this is what we implemented, it’s good, we could improve it this way”, whereas on a clean session it’s a lot more likely to find actual issues or things that were overlooked in the implementation session.

Can you give a little more detail how you execute these steps? Is there a specific tool you use, or is it simply different kinds of prompts?

I wrote it down here: https://x.com/BraaiEngineer/status/2016887552163119225

However, I have since condensed this into 2 prompts:

1. Write plan in Plan Mode

2. (Exit Plan Mode) Critique -> Improve loop -> Implement.

I follow a very similar workflow, with manual human review of plans and continuous feedback loops with the plan iterations

See me in action here. It's a quick demo: https://youtu.be/a_AT7cEN_9I

similar approach