I like how everyone laughed when OpenAI said their models will have "PhD-Level Intelligence" and now the goalpost has been moved to if AI can create new math (i.e., not PhD-Level, but Leibniz/Euler/Galois level.)

large language models do not have pigeon-level intelligence. They can't even feed themselves.

I still laugh.

Have you updated your priors after this announcement? If not, why not?

Yes let me calculate the exact change it’s 0.004748394 probability now based on my own made up statistical vibes that I feel

Prior whats?

When a qualifying noun is absent , then priors means prior beliefs.

I don't have enough information about the announcement for it to mean much to me. I don't know much about this field of maths. I don't know how many mathematicians were actively working on this problem. It could be zero, which would indicate it's not really that interesting. The article gushes about how it's a Very Important Problem, but it's not even mentioned on https://en.wikipedia.org/wiki/List_of_conjectures_by_Paul_Er.... I'm sure the busy folk at openAI will fix that soon however. Furthermore the extensive dishonesty of companies like openAI makes me suspicious of just how this was achieved. Overall the announcement is of little interest to my "priors", although I don't typically think in such terms.

It is extremely well known. Lots of people have tried to solve it and it stood basically stuck for 80 years. It is getting harder every day to downplay these models.

Given its elementary nature (very easy to state), you can bet that a lot of very bright people have worked on it (I know of one MIT graduate who specialized in Geometry had a lot of interest in it).

The problem was pretty well known, and had many human attempts. There's some room to argue that the right humans hadn't attempted it, as the solution used advanced methods from another field of math. But imho, whereas many prior AI victories could be explained by not enough human attention, there is no such excuse in this case, and one should acknowledge this is a notable achievement.

You don't have enough knowledge to dismiss them, but you still laugh? For?

Do you have enough knowledge? I laugh at everyone who accepts these claims in the light they're presented despite knowing so little.

You don't know the names of the mathematicians who've given their thoughts on this? If not, you really should just not comment on anything mathematical ever again.

I do know their names. However I'm not in the field and there are many cases in recent years of high-profile scientists putting their weight behind highly dubious claims. Thanks for the advice, by the way.

Note that I'm not disputing the validity of the counterexample itself.

Yet it still codes like a junior developer that memorized all of stack overflow.

Even if the code was like that (it isn't), the power of the current crop of models to analyze data for patterns and build context out of code is leaps and bounds what it was even a year ago. And any developer will tell you that the hardest part of fixing a bug is knowing where the bug is in the first place. Once you know where it is, fixing it is usually trivial.

There is serious magic happening in the construction of model context.

PhDs code like that too. Especially if they're statisticians :)

Personally I don't find this to be true anymore! It's not always great and does still will often tend towards unneeded complexity (especially if not pushed a bit), but I often find GPT 5.5 writing code I would have written myself. This was very much not true with earlier models (who make something that worked, but I'd always have to rewrite to make it "good code").

Personally I found 5.5 a massive step back from 5.4. Both of them still use way too many fallbacks and unnecessary checks, especially if you're having it output php. It's fine if you're just one person and checking everything and able to catch and correct. But it's really bad when you have a team all using it, not checking the output and trusting it's output leading to spaghetti code. Technically works, but very messy and will no doubt lead to buggy code.

It still writes like a junior dev, in that despite AI being able to get a picture of an entire repo, it's changes are typically confined to the task it's working on and will opt to duplicate logic to keep changes contained. Again, technically works, not ideal.

Clearly you've never supervised junior developers.

That's literally my job...

Or PhDs