I'm not sure that's enough RAM to run it at full precision (FP8).
This guy ran a 4-bit quantized version with 768GB RAM: https://news.ycombinator.com/item?id=42897205
I'm not sure that's enough RAM to run it at full precision (FP8).
This guy ran a 4-bit quantized version with 768GB RAM: https://news.ycombinator.com/item?id=42897205