IMO, they're worth trying - they don't become completely braindead at Q2 or Q3, if it's a large enough model, apparently. (I've had surprisingly decent experience with Q2 quants of large-enough models. Is it as good as a Q4? No. But, hey - if you've got the bandwidth, download one and try it!)
Also, don't forget that Mixture of Experts (MoE) models perform better than you'd expect, because only a small part of the model is actually "active" - so e.g. a Qwen3-whatever-80B-A3B would be 80 billion total, but 3 billion active- worth trying if you've got enough system ram for the 80 billion, and enoguh vram for the 3.
You don't even need system RAM for the inactive experts, they can simply reside on disk and be accessed via mmap. The main remaining constraints these days will be any dense layers, plus the context size due to KV cache. The KV cache has very sparse writes so it can be offloaded to swap.
Are there any benchmarks (or even vibes!) about the token/second one can expect with this strategy?
No real fixed benchmarks AIUI since performance will then depend on how much extra RAM you have (which in turn depends on what queries you're making, how much context you're using etc.) and how high-performance your storage is. Given enough RAM, you aren't really losing any performance because the OS is caching everything for you.
(But then even placing inactive experts in system RAM is controversial: you're leaving perf on the table compared to having them all in VRAM!)