The increased context length is interesting.
It would be incredible to be able to feed an entire codebase into a model and say "add this feature" or "we're having a bug where X is happening, tell me why", but then you are limited by the output token length
As others have pointed out too, the more tokens you use, the less accuracy you get and the more it gets confused, I've noticed this too
We are a ways away yet from being able to input an entire codebase, and have it give you back an updated version of that codebase.