Hacker News

jsheard 2 days ago [ - ]

Those only gave each GPU a single PCIe lane though, since crypto mining barely needed to move any data around. If your application doesn't fit that mould then you'll need a much, much more expensive platform.

dist-epoch 2 days ago [ - ]

After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.

numpad0 2 days ago [ - ]

Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.