While I disagree with OP about removing stuff from the model, there’s a valid question about tradeoffs between intelligence and price.

Deepseek Flash is almost certainly wrong more often than Opus or Fable. It also costs like 5% as much.

The question becomes if I run Deepseek in a loop to fix the mistakes it made that Opus/Fable didn’t, can it fix its own bugs in few enough tokens that it’s still cheaper?

So far, the answer seems to be “yes, by a significant margin”. A lot of tasks are simple enough that both Deepseek and Opus or Sonnet can one-shot it, which is a huge cost win for Deepseek. Even on the long tail, it’s usually like 4x the tokens on Deepseek which is still way cheaper than Opus.

There are things that Opus can do that Deepseek just won’t ever really nail, but it happens so infrequently that I just don’t worry. Like most people, most of what I do is the same sort of “3 tier app with a React frontend” that doesn’t take a rocket scientist to work out.