Hacker News

new | ask | show | jobs

valine 18 hours ago [ - ]

Yup exactly, in principle it helps with both inference speed by reducing memory bandwidth usage and also reduces the memory footprint of your kvcache.