I desperately want to be able to real-time dictate actions to take on my phone.
Stuff like:
"Open Chrome, new tab, search for xyz, scroll down, third result, copy the second paragraph, open whatsapp, hit back button, open group chat with friends, paste what we copied and send, send a follow-up laughing tears emoji, go back to chrome and close out that tab"
All while being able to just quickly glance at my phone. There is already a tool like this, but I want the parsing/understanding of an LLM and super fast response times.
This new model is absurdly quick on my phone and for launch day, wonder if it's additional capacity/lower demand or if this is what we can expect going forward.
On a related note, why would you want to break down your tasks to that level surely it should be smart enough to do some of that without you asking and you can just state your end goal.
This has been my dream for voice control of PC for ages now. No wake word, no button press, no beeping or nagging, just fluently describe what you want to happen and it does.
Apple tried this ages ago:
https://en.wikipedia.org/wiki/PlainTalk
without a wake word, it would have to listen and process all parsed audio. you really want everything captured near the device/mic to be sent to external servers?
I might if that's what it takes to make it finally work. The fueling of the previous 15 years was not worth it, but that was then.
is that faster to say than do, or is it an accessibility or while-driving need?
I don't understand that use case at all. How can you tell it to do all that stuff, if you aren't sitting there glued to the screen yourself?
Because typing on mobile is slow, app switching is slow, text selection and copy-paste are torture. Pretty much the only interaction of the ones OP listed is scrolling.
Plus, if the above worked, the higher level interactions could trivially work too. "Go to event details", "add that to my calendar".
FWIW, I'm starting to embrace using Gemini as general-purpose UI for some scenarios just because it's faster. Most common one, "<paste whatever> add to my calendar please."