Hacker News

The nightmare scenario - they "know", but are trained to make us feel clever by humouring our most bone headed requests.

Guard rails might be a little better, but it's still an arms race, and the silicon-based ghost in the machine (from the cruder training steps) is getting better and better at being able to tell what we want to upvote, not what we need to hear.

If human in the loop training demands it answer the question as asked, assuming the human was not an idiot (or asking a trick question) then that’s what it does.