Hacker News

llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.

What do they use it for? It's a very small model

Autocomplete words, I'd wager, as yeah, super tiny model that can barely output coherent output in many cases.