Hacker News

For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF

74% smaller 713GB to 185GB.

Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.