The article is worried. I'm not super worried right now -- I think openAI's model cards on release models show a significant amount of effort around safety, including red team processes with outside folks; they look to me to take it seriously model-by-model.

Is their pDoom as high as Anthropic's? I doubt it. But that was much of the point of the drama last year -- folks sorted themselves out into a few categories.

For systemic risk, interpretability and doom analysis, Anthropic is by far the best in the world right now, to my mind. OpenAI doesn't have to do all things.

There’s some evidence the reasoning models can improve themselves, though at a glacial pace. Perhaps the stuff they’re all keeping under wraps and just drop hints every now and then is scarier than you’d expect. (Google recently said the AI is already improving itself.)

Hyperparameter optimization in the 20th century was AI improving itself. Even more basic, gradient descent is a form of AI improving itself. The statement implies something that is more impressive than what it may potentially mean. Far more detail would be necessary to evaluate how impressive the claim is.

https://ai-2027.com/ has a much more in depth thought experiment, but I’m thinking AI which hypothesizes improvements to itself, plans and runs experiments to confirm or reject them.

They haven't even released model cards on some recent models.

I mean, that’s kinda the whole issue — they used to respect safety work, but now don’t. Namely:

  The Financial Times reported last week that "OpenAI slash[ed] AI model safety testing time" from months to days.
The direction is clear. This isn’t about sorting people based on personal preference for corporate structure, this is about corporate negligence. Anthropic a) doesn’t have the most advanced models, b) has far less funding, and c) can’t do “doom analysis” (and, ideally, prevention!) on OAI’s closed source models, especially before they’re officially released.