Hacker News

Octoth0rpe 20 hours ago [ - ]

> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)

There seems to be at least some detail on that point.