What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?

He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.

Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).

Yes to both comments. I said that to:

1. disclose my method was not quantifiably measurable as the not model, because that is not important to me, speed of action/development outcomes is more important to me, and because

2. I’ve observed a large gap between benchmark toppers and my own results

But make no mistake, I like have the terminals scrolling live across multiple monitors so I can glance at them periodically and watch their response quality, so I care and notice which give better/worse results.

My biggest goal right now after accuracy is achieving more natural human-like English for technical writing.