Hacker News

...at least until we get real Test-Time Training (TTT) that encodes the state into model weights. If vast amounts of human knowledge can be compressed into ~400GB for frontier models, it's easy to imagine the same for our entire context