Hacker News

Feels like a training-data artifact. SFT and preference data are full of "here's a cleaner version of your file", not "here's the minimum 3-line diff". The model learned bigger, more polished outputs win. Prompting around it helps a bit but you're fighting the prior.