Hacker News

zargon 20 hours ago [ - ]

These calculators are almost entirely useless. They don't understand specific model architectures. Even the ones that try to support only specific models (like the apxml one) get it very wrong a lot of the time.

For example, the one you linked, when I provide a Qwen3.5 27B Q_4_M GGUF [0], says that it will require 338 GB of memory with 16-bit kv cache. That is wrong by over an order of magnitude.

[0] https://huggingface.co/bartowski/Qwen_Qwen3.5-27B-GGUF/resol...

gdevenyi 18 hours ago [ - ]

Mine does https://github.com/gdevenyi/huggingface-estimate

zargon 17 hours ago [ - ]

Excellent job with this! I tried a few combinations that completely fail on other calculators and yours gets VRAM usage pretty much spot on, and even the performance estimate is in the ballpark to what I see with mixed VRAM / RAM workloads.

It's a shame that search is so polluted these days that it's impossible to find good tools like yours.