I'm surprised they still hold out hope that this kind of mechanism could ultimately help with AI safety, when they already observed how the reward-hacking safeguard was itself duly reward-hacked. Predictably so, or at least it is to me, after getting a very enlightening introduction to AI safety via Rob Miles' brilliant youtube videos on the subject. See for example https://youtu.be/0pgEMWy70Qk