Hacker News

This exact problem comes up in AI alignment. It’s not enough to just look at the legible outputs.

If you are going to trust someone with important responsibilities, you want them to “show their working” and convince you that that are not faking it.

The difference of course is what Alice and Bob do when the mask is off, when no one is looking.