Hacker News

elorant 4 days ago [ - ]

You only get 80Gbps network bandwidth. There's your bottleneck right there. Infiniband in comparison can give you up to x10 times that.

storus 3 days ago [ - ]

I think the op meant pipeline parallelism where during inference you only transfer the activation between layers where you cut the model in two, which shouldn't be too large.