Hey some of us are on hardware (gfx906 based Radeon MI50s with 32GB of stupidly fast VRAM and basically no compute) that inference significantly faster with Q_0 and Q_1 quants