Hacker News

LLMs are memory bandwidth bound not compute bound.

This is incorrect, prompt processing is compute bound.

LLMs are bound by both and depends on the hardware which factor is higher.

This is only true for some parts of the time cost function.