How many pelican riding bicycle SVGs were there before this test existed? What if the training data is being polluted with all these wonky results...
How many pelican riding bicycle SVGs were there before this test existed? What if the training data is being polluted with all these wonky results...
I'd argue that a models ability to ignore/manage/sift through the noise added to the training set from other LLMs increases in importance and value as time goes on.
You're correct. It's not as useful as it (ever?) was as a measure of performance...but it's fun and brings me joy.