> I experimented with the Q2 and Q4 quants.

Of course you get degraded performance with this.

Obviously. That's why I led with that statement.

Those are the quant thresholds where people with mid-high end hardware can run this locally at reasonable speed, though.

In my experience Q2 is flakey, but Q4 isn't dramatically worse.