Hacker News

1 million tokens is great until you notice the long context scores fall off a cliff past 256K and the rest is basically vibes and auto compacting.

I bet they lack good long context training data and need to start a flywheel of collecting it via their api (from willing customers)