Hacker News

What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?

danielmarkbruce 4 months ago [ - ]

He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.

woodson 4 months ago [ - ]

Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).

wcallahan 3 months ago [ - ]

Yes to both comments. I said that to:

1. disclose my method was not quantifiably measurable as the not model, because that is not important to me, speed of action/development outcomes is more important to me, and because

2. I’ve observed a large gap between benchmark toppers and my own results

But make no mistake, I like have the terminals scrolling live across multiple monitors so I can glance at them periodically and watch their response quality, so I care and notice which give better/worse results.

My biggest goal right now after accuracy is achieving more natural human-like English for technical writing.