Hacker News

Depends on the model - if you have a sparse model with MoE, then you can divide it up into smaller nodes, your dense 30b models, I do not see them flying anytime soon.

Intel pro B50 in a dumpster PC would do you well better at this model (not enough ram for dense 30b alas) and get close to 20 tokens a second and so much cheaper.