I'm being very sarcastic, local model evangalists seems to just be operating on vibes when they say these things and are completely disconnected from how models work, what the hardware requirements are.
Prices aren't going down, and consumer platforms are being shipped with less RAM so we can be sold cloud products. This isn't going to happen.
Can you please explain to me how you're going to fit 700bb-1T params in 64GB of RAM? You realize there are memory requirements proportional to model size?
> Can you please explain to me how you're going to fit 700bb-1T params in 64GB of RAM?
You don't. What they're saying is that today's small models (that fit on consumer hw) are better than yesteryear's top models. GPT4 was reportedly 8x 220B (~1.6T) MoE, and today you can run a 30-120B model that beats it handedly in real-world tasks.
Similarly for 4-20B models beating GPT3 (175B) and so on.
There is a sweetspot of "good enough" that the small models can reach, where you get equivalent tasks solved fully locally. They'll never touch SotA, but they'll reach 2-3-4 year's SotA. Which, depending on the task you need, it can be "good enough".