Hacker News

My experience isn't consistent with it being significant (or really any) quality gains on actual real world usage for me or the team I'm on.

The plural of anecdote is not data. What are your evals telling you?

The % of accepted, actionable prompts is not up if I use Opus 4.7/4.6/4.8 if that is what you are asking.