> 6x reduction in memory usage for KV caches and up to 8x boost in speed

mind that you're quoting marketing material that's largely based on unfair baseline testing (like comparing 4 bit vs 32 bit to get "8x speed")

https://www.youtube.com/watch?v=haoAI2lIZ74