I skimmed through the computer use code. It's possible to build this with other AI providers too. For instance you can asks ChatGPT API to call functions for click and scroll and type with specific parameters and execute them using OS's APIs (A11y APIs usually)
Did I miss something? Did they have to make changes to the model for this?
> execute them using OS's APIs (A11y APIs usually)
I wonder if we'll end up with a new set of AI APIs in Windows, macOS, and Linux in the future. Maybe an easier way for them to iterate through windows and the UI elements available in each.
It already exists for KDE: https://community.kde.org/Selenium