> 6x reduction in memory usage for KV caches and up to 8x boost in speed
mind that you're quoting marketing material that's largely based on unfair baseline testing (like comparing 4 bit vs 32 bit to get "8x speed")
> 6x reduction in memory usage for KV caches and up to 8x boost in speed
mind that you're quoting marketing material that's largely based on unfair baseline testing (like comparing 4 bit vs 32 bit to get "8x speed")