Hacker News

I really don't have any numbers to back this up. But it feels like the sweet spot is around ~500k context size. Anything larger then that, you usually have scoping issues, trying to do too much at the same time, or having having issues with the quality of what's in the context at all.

For me, I would say speed (not just time to first token, but a complete generation) is more important then going for a larger context size.