Tbf I've thought a decent bit about how most current AI is essentially just being used to digest what exists on a website/etc. Honestly even just the vector search/RAG part is useful, but more-so with a model to help do some initial filtering of it.
It's an odd use case - we have used language for a millionish years or so and it makes sense that that's the easiest way for us to get at information/do things.
But at the same time it's faster for me to read than listen, but it's often slower to type than to speak. It's faster to hit one button in a familiar place to do some predetermined thing, but much slower when the location of that button changes/gets hidden under submenus/I'm not familiar with an app or website.
On Android I constantly use the search function of the settings menu and I feel like this will be the golden UX going forward - a side by side UX + NL interface. So I can ask "how do I add a photo" and from there I get taken to the right place and can continue to add multiple photos in one go following the same pattern.
Though I suppose the nicer alternative is just "add all the photos I took near the waterfall from today".