In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it.
If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4
In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it.
If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4