Hacker News

new | ask | show | jobs

smallerize 3 hours ago [ - ]

I think it means most of the training data is short. And a lot of the long-context examples are conversations where the middle turns are less important.