New GPUs come out all the time. New phones come out (if you count all the manufacturers) all the time. We do not need to always buy the new one.
Current open weight models < 20B are already capable of being useful. With even 1K tokens/second, they would change what it means to interact with them or for models to interact with the computer.
hm yeah I guess if they stick to shitty models it works out, I was talking about the models people use to actually do things instead of shitposting from openclaw and getting reminders about their next dentist appointment.
The trick with small models is what you ask them to do. I am working on a data extraction app (from emails and files) that works entirely local. I applied for Taalas API because it would be awesome fit.
dwata: Entirely Local Financial Data Extraction from Emails Using Ministral 3 3B with Ollama: https://youtu.be/LVT-jYlvM18
https://github.com/brainless/dwata
Considering that enamel regrowth is still experimental (only curodont exists as a commercial product), those dentist appointments are probably the most important routine healthcare appointments in your life. Pick something that is actually useless.
If you need a full blown llm with root access to all your devices to remind you about an appointment something is very wrong with your life.