Hacker News

throwaw12 2 hours ago [ - ]

> interleaving the processing of 200ms worth of input and generation of 200ms worth of output.

How does this work? Don't LLMs/transformers need whole context to output next chunk of tokens?