Hacker News

What do you think this particular prompt is evaluating for?

The more popular these particular evals are, the more likely the model will be trained for them.