Anything special about these questions? Are they unsolved by humans. I am not working in mathematics research so its hard to tell the importance.

The abstract of the article is very short, and seems pretty clear to both of your questions.

This is what is special about them:

> a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now;

I.e. these are problems of some practical interest, not just performative/competitive maths.

And this is what is know about the solutions:

> the answers are known to the authors of the questions but will remain encrypted for a short time.

I.e. a solution is known, but is guaranteed to not be in the training set for any AI.

> I.e. a solution is known, but is guaranteed to not be in the training set for any AI.

Not a mathematician and obviously you guys understand this better than I do. One thing I can't understand is how they're going to judge if a solution was AI written or human written. I mean, a human could also potentially solve the problem and pass it off as AI? You might say why would a human want to do that? Normal mathematicians might not want to do that. But mathematicians hired by Anthropic or OpenAI might want to do that to pass it off as AI achievements?

Well, I think the paper answers that too. These problems are intended as a tool for honest researchers to use for exploring the capabilities of current AI models, in a reasonably fair way. They're specifically not intended as a rigorous benchmark to be treated adversarially.

Of course a math expert could solve the problems themselves and lie by saying that an AI model did it. In the same way, somebody with enough money could secretly film a movie and then claim that it was made by AI. That's outside the scope of what this paper is trying to address.

The point is not to score models based on how many of the problems they can solve. The point is to look at the models' responses and see how good they are at tackling the problem. And that's why the authors say that ideally, people solving these problems with AI would post complete chat transcripts (or the equivalent) so that readers can assess how much of the intellectual contribution actually came from AI.

> these are problems of some practical interest, not just performative/competitive maths.

FrontierMath did this a year ago. Where is the novelty here?

> a solution is known, but is guaranteed to not be in the training set for any AI.

Wrong, as the questions were poses to commercial AI models and they can solve them.

This paper violates basic benchmarking principles.

> Wrong, as the questions were poses to commercial AI models and they can solve them.

Why does this matter? As far as I can tell, because the solution is not known this only affects the time constant (i.e. the problems were known for longer than a week). It doesn't seem that I should care about that.

Because the companies have the data and can solve them -- so providing the question to a company with the necessary manpower, one cannot guarantee anymore that the solution is not known, and not contained in the training sample.