Been reading posts like these for 3 years now. There’s multiple sites with #s. I’m willing to buy “I’m paying rent on someone’s agent harness and god knows what’s in the system prompt rn”, but in the face of numbers, gotta discount the anecdotal.
Been reading posts like these for 3 years now. There’s multiple sites with #s. I’m willing to buy “I’m paying rent on someone’s agent harness and god knows what’s in the system prompt rn”, but in the face of numbers, gotta discount the anecdotal.
You're probably right. It's probably more likely that for some period of time I forgot that I switched to the large context Opus vs Sonnet and it was not needed for the level of complexity of my work.
Yeah, why trust your actual experience over numbers? Nothing surer than synthetic benchmarks
Strawman, and, synthetic benchmark? :)