You can run a 4 bit quantized 120B model on a 96GB workstation card, the Blackwell Pro workstation, which are $7500. Considering the 5090 is bought by gamers for $3300 it’s definitely attainable, even though it’s obviously expensive.
I’m running a gaming rig and could swap one in right now without having to change anything compared to my 5090, so no $5000 Threadripper or a $1000 HEDT motherboard with a ton of RAM slots, just a 1000 watt PSU and a dream.
> 4 bit quantized 120B model on a 96GB workstation card, the Blackwell Pro workstation
Would be interesting to know how it performs in terms of quality and token/sec.