There aren't many working on it though, definitely not enough given how many resources are going into building AI.

AI safety at these labs are largely focused on surface level measures and aren't empowered to stop progress of the company. I was surprised when Anthropic initially held Mythos back from the public, but it was always a temporary measure to give controlled access rather than a pause to make meaningful improvements in AI safety.

The only measures we see are the surface-level ones, because those are the only ones that sort of work.

Alignment is a hard, possibly impossible problem. Anthropic's gambit is they luck upon a solution before the paperclip maximizers take over.

But that's exactly my point. If they actually did legitimately fear that AGI or whatever the bar is could significantly impact all of humanity in a bad way they wouldn't be okay with saying "well this coat of paint sort of slows down the rust."

Either its a dangerous technology or it isn't, and if it is then surface level fixes that kind of work is completely unacceptable.

But that's the point. Assuming alignment is not possible and the risk caused by unaligned models is real, shouldn't then all effort go into preventing such models from existing in the first place?

...which would actually be an easy to solve problem unless you go out of your way to build such a model.

How does building said models prevent them from existing?

Prevention should look a lot more like a global moratorium with whatever enforcement is necessary to stop and prevent any breaches of the agreement.

Edit: I did misread your comment on first pass, we may be in agreement here. Sorry!

> Prevention should look a lot more like a global moratorium with whatever enforcement is necessary to stop and prevent any breaches of the agreement.

Yep, that was my point. Either the ostensible danger stemming from the models is not real, then this stuff is moot anyway, or it is, then why are we building them in the first place?

i wish Ilya and crew would chime in