Hacker News

dmonitor a day ago [ - ]

You'll have to define modern workstation for me, because I was under the impression that unless you've purpose-built your machine to run LLMs, this size model is impossible.

wincy a day ago [ - ]

You can run a 4 bit quantized 120B model on a 96GB workstation card, the Blackwell Pro workstation, which are $7500. Considering the 5090 is bought by gamers for $3300 it’s definitely attainable, even though it’s obviously expensive.

I’m running a gaming rig and could swap one in right now without having to change anything compared to my 5090, so no $5000 Threadripper or a $1000 HEDT motherboard with a ton of RAM slots, just a 1000 watt PSU and a dream.

Mars008 an hour ago [ - ]

> 4 bit quantized 120B model on a 96GB workstation card, the Blackwell Pro workstation

Would be interesting to know how it performs in terms of quality and token/sec.

0x457 a day ago [ - ]

When people say "modern workstation" in context of LLM, they usually mean its consumer(pro-sumer?) grade hardware on a single machine. As opposed to racks of GPUs that you can even buy as a mere mortal (min order size)

It doesn't mean you can grab your work laptop from 5 years ago and run it there.

int_19h a day ago [ - ]

Get a Mac Studio with however much memory you need, and ideally an Ultra chip (for max memory bandwidth), and there's your workstation. I regularly run quantized 100b+ models on my M1 Ultra with 128Gb RAM.