Hacker News

For reference, I get 29 tokens/s with the same model using 12 threads on AMD 9950X3D. Guess it is 2x faster because AVX-512 is 2x faster on Zen 5, roughly speaking. Somewhat unexpectedly, increasing number of threads decreases performance, 16 threads already perform slightly worse and with 32 threads I only get 26.5 tokens/s.

On 5090 same model produces ~170 tokens/s.