> these are problems of some practical interest, not just performative/competitive maths.
FrontierMath did this a year ago. Where is the novelty here?
> a solution is known, but is guaranteed to not be in the training set for any AI.
Wrong, as the questions were poses to commercial AI models and they can solve them.
This paper violates basic benchmarking principles.
> Wrong, as the questions were poses to commercial AI models and they can solve them.
Why does this matter? As far as I can tell, because the solution is not known this only affects the time constant (i.e. the problems were known for longer than a week). It doesn't seem that I should care about that.
Because the companies have the data and can solve them -- so providing the question to a company with the necessary manpower, one cannot guarantee anymore that the solution is not known, and not contained in the training sample.