The privacy angle is not that interesting to me.
- You can find inference providers with whatever privacy terms you're looking for
- If you're using LLMs with real data (let's say handling GMail) then Google has your data anyway so might as well use Gemini API
- Even if you're a hardcore roll-your-own-mail-server type, you probably still use a hosted search engine and have gotten comfortable with their privacy terms
Also on cost the point is you can use an API that's many times smarter and faster for a rounding error in cost compared to your Mac. So why bother with local except for the cool factor?