I've been poking at running LLMs in the browser. It feels like we're definitely close (<1 year) to seeing real use cases there.

Ubiquity and coverage of devices is what will take longest. Largely dependent on how well we can shrink models with similar performance and how much we can accelerate mobile devices. This feels like it's but further (<3 years?)