It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.

I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.

Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example.

I haven't done benchmarking yet (plan to do them), but it should be similar to our post on DeepSeek-V3.1 Dynamic GGUFs: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs