Since model size determines die size, and die size has absolute limits as well as a correlation with yield, eventually it hits physical and economic limits. There was also some discussion about ganging chips.
From what I read here, the required chip size would scale linearly with the number of model weights. That alone puts a ceiling on the size of model.
Also the defect rate grows as the chip grows. It seems like there might be room for innovation in fault tolerance here, compared to a CPU where a randomly flipped bit can be catastrophic.
The top comment on Friday's discussion does some math on die size. https://news.ycombinator.com/item?id=47086634
Since model size determines die size, and die size has absolute limits as well as a correlation with yield, eventually it hits physical and economic limits. There was also some discussion about ganging chips.
From what I read here, the required chip size would scale linearly with the number of model weights. That alone puts a ceiling on the size of model.
Also the defect rate grows as the chip grows. It seems like there might be room for innovation in fault tolerance here, compared to a CPU where a randomly flipped bit can be catastrophic.