Hah, yeah I'd love to know if OpenAI ran evals that were fine-grained enough to prove to themselves that putting that bit in capitals made a meaningful difference in how likely the LLM was to just provide the homework answer!
Hah, yeah I'd love to know if OpenAI ran evals that were fine-grained enough to prove to themselves that putting that bit in capitals made a meaningful difference in how likely the LLM was to just provide the homework answer!