120B MoE. The 20B is dense.
As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.
120B MoE. The 20B is dense.
As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.
How much ram do you need to run this !!??
Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).
You can go below one byte per parameter. 4-bit quantization is fairly popular. It does affect quality - for some models more so than others - but, generally speaking, a 4-bit quantized model is still going to do significantly better than an 8-bit model with 1/2 parameters.