Hacker News

In a not-too-distant future (5 years?) small LLMs will be good enough to be used as generic models for most tasks. And if you have a dedicated ASIC small enough to fit in an iPhone, you have a truly local AI device with the bonus point that you get something really new to sell in every new generation (i.e. acces to an even more powerful model)

wmf 5 hours ago [ - ]

The Taalas approach is much more expensive than the NPU that phones already have.

slow_typist 4 hours ago [ - ]

Yes but not in five years. The chips will be dirt cheap by then. We‘ll get “intelligent” washing machines that will discuss the amount of detergent and eventually berate us. Toasters with voice input. And really annoying elevators. Also bugs that keep an extremely low RF profile (only phoning home when the target is talking business).

wmf 4 hours ago [ - ]

No, Taalas requires more silicon which will always cost more than storing weights in DRAM.

throwthrowuknow 7 hours ago [ - ]

it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap

yunwal 5 hours ago [ - ]

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

hamdingers 7 hours ago [ - ]

It does if you care about who can access to your tokens