Hacker News

I mind, a lot. That is why I've built a cheap (in relative terms) rig that can run models up to approximately 600B parameters, although only extremely slowly once the model spills out of the GPUs. I would much rather be able to run open LLMs slowly than not at all.