And that's at unusable speeds - it takes about triple that amount to run it decently fast at int4.
Now as the other replies say, you should very likely run a quantized version anyway.
And that's at unusable speeds - it takes about triple that amount to run it decently fast at int4.
Now as the other replies say, you should very likely run a quantized version anyway.