Hacker News

Y

Hacker News

new | ask | show | jobs

mongrelion a day ago [ - ]

Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!