Hacker News

While the headline is clickbait (this genuinely seems like an engineering leader wanting to code again, and absolutely does not indicate any kind of exposé or lack of confidence)... the article links to OpenAI's most recent Preparedness Framework here: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbdde...

It's disappointing to me that it's scoped to three narrow Tracked Categories now: Biological & Chemical, Cybersecurity, and AI Self-Improvement (the latter thresholded in terms of replicating the capabilities of leading AI researchers).

OpenAI does write about the removal of Persuasion as a category, with citations, on page 8:

> Persuasion: OpenAI prohibits the use of our products to manipulate political views as part of our Model Spec, and we build in safeguards to back this policy. We also continue to study the persuasive and relational capabilities of models (including on emotional well-being and preventing bias in our products) and monitor and investigate misuse of our products (including for influence operations). We believe many of the challenges around AI persuasion risks require solutions at a systemic or societal level, and we actively contribute to these efforts through our participation as a steering committee member of C2PA and working with lawmaker and industry peers to support state legislation on AI content provenance in Florida and California. Within our wider safety stack, our Preparedness Framework is specifically focused on frontier AI risks meeting a specific definition of severe harms, and Persuasion category risks do not fit the criteria for inclusion.

But IMO this falls short of the mark. As one of many examples, an AI that became remarkably good at influencing people, at scale, to self-harm or perpetrate violence would no longer be in scope for research. But, by their own criteria, one could easily argue that such a capability is Plausible, Measurable, Severe, Net New, and Irremediable once violence has occurred.

We live in a world where stochastic terror has become remarkably effective - regardless of whether you feel that word is overused, it's well-documented that people have historically used forums to encourage others to perpetrate mass casualty events. The idea that Model Spec adherence is sufficient to prevent AI from greatly increasing the scalability of this phenomenon, seems to be a direction that could put many in danger.

EDIT: this was posted before I was aware of today's mass shooting event in Florida, and I do not intend to imply any connection between my post and this event.