>overly tuning models just to specific test already published tests, rather than focusing on making them generalize.
I think you just described SATs and other standardized tests
>overly tuning models just to specific test already published tests, rather than focusing on making them generalize.
I think you just described SATs and other standardized tests
SAT has a correlation to IQ of 0.82 to 0.86 and I do think IQ is very useful in judging intelligence.
https://gwern.net/doc/iq/high/smpy/2004-frey.pdf
It's a useful diagnostic when used in a battery of diagnostic tests of cognitive function, but to the point of this thread: it is notoriously not a good ranking mechanism.