Hacker News

> (And actually during generation you can use speculative decoding to do better than this roofline anyways).

And more importantly batches, so taking the example from the blog post, it would be 32 tokens per each forward pass in the decoding phase.