I think the best use case for AI `Computer Use` would be a simple positioning of the mouse and asking for conformation before a click. For most use cases this is all people will want/need. If you don't know how to do something, it is basically teaching you how, in this case, rather than taking full control and doing things so fast you don't have time to stop of going rogue.
I totally agree with you. At orango.ai, we have implemented the auto-click feature, but before it clicks, we position the cursor on the button and display a brief loading animation, allowing the user to interrupt the process.
That's a great approach! User is still supervising but not having to do anything but watch, most of the time. That's perfect.
Maybe we could have both - models to improve accessibility (e.g. for users who can't move their body well) and models to perform high level tasks without supervision.
It could be very empowering for users with disabilities to regain access computers. But it would also be very powerful to be able to ask "use Photoshop to remove the power lines from this photo" and have the model complete the task and drop off a few samples in a folder somewhere.
Yep. I agree. The "auto-click" thing would be optional. Should be able to turn it on and off. With auto-click off it would just position the mouse and say "click here".
Cluade scans page and decides which button to click before the screen layout is finished. By the time user authorizes the click, layout has shifted and your click lands on malware advertisements.
lol. If any website ever did that to me it would be the last time I ever went to it. Not a big concern for me.
Youtube constantly moves it's layout seconds after the page begins to paint, so I try to click on fullscreen or whatever and then the viewer shifts to the side and I wound up clicking a link to some other video.
Probably would have been an ad there if I didn't block those, though.
Even better, how about giving the AI the capability to move and draw and overlay on the screen with a separate virtual cursor as in a Zoom session?
I like the drawing on the screen idea. The biggest use case of that I can think of is drawing a black rectangle over all ADs!!!!
People would mostly just rubber-stamp it
But it would slow down the masses
Some people would jailbreak the agents though