article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
Yeah, though it's not great marketing. Especially for hiring interpretability researchers. Their own alignment research has reward model interpretability, personality features and so on (see https://alignment.openai.com ). It just seems like a different department wrote it, which is a shame because I'd love to read about goblin feature vectors and functional emotions.
someone woke up on the wrong side of the goblin today
real goblin-y response