I thought Q4_K_M is the standard. Why did you choose the 6-bit variant? Does it generate better input?
There is no standard.
The higher quantization - the better results, but more memory is needed. Q8 is the best.
FP32 is best, although I wonder if there isn’t something better I don’t know about. Q8 is for the most part equal to FP16 in practical terms by being smart about what is quantized, but iirc always slower than FP16 and FP8.
There is no standard.
The higher quantization - the better results, but more memory is needed. Q8 is the best.
FP32 is best, although I wonder if there isn’t something better I don’t know about. Q8 is for the most part equal to FP16 in practical terms by being smart about what is quantized, but iirc always slower than FP16 and FP8.