"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."
"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."
The accomplishment is cool. But all Erdos problems and other complicated mathematical problems they solved were accomplished with general-purpose models too. In fact for some of those problems, including bountied ones, they were public models. So I don't get saying this
all reasoning is .. well problem reasoning. restricting black-box AIs to specific human-defined domains because we believe that's better is such a human-ist thing to do.
I trust openAI's marketing team 100%
It seems plausible given that people have been using off the shelf 5.5 xhigh to decent success with some erdos problems. There is likely still some scaffolding around it though (like parallel sampling or separate verifier step) since it's not clear if you can just "one shot" problems like this.