Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM.
Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.
Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM.
Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.