Hacker News

Y

Hacker News

new | ask | show | jobs

dannyw 3 hours ago [ - ]

Wouldn't you need to re-run across lots of samples (even for a single eval/bench) to avoid outsized impacts from just bad luck?