Hacker News

binary0010 5 hours ago [ - ]

Maybe try making a simple randomize script to swap the three latest models. And see if you can tell which ones are meaningfully different without knowing which ones are flipped on or off?

osigurdson 5 hours ago [ - ]

I find the quality ebbs and flows even on the same model. My guess it is something to do with GPU availability but only guessing.

atq2119 5 hours ago [ - ]

Unless you're systematically repeating the exact same task, the most parsimonious explanation is that you're seeing natural variation based on different tasks, random sampling of tokens, etc.