With UIPath, Appian, etc. the whole field of RPA (robotic process automation) is a $XX billion industry that is built on that exact premise (that it's more feasible to do automation via GUIs than badly built/non-existing APIs).

Depending on how many GUI actions correspond to one equivalent AI orchestrated API call, this might also not be too bad in terms of efficiency.

Most of the GUIs are Web pages, though, so you could just interact directly with an HTTP server and not actually render the screen.

Or you could teach it to hack into the backend and add an API...

Oh, and on edit, "bizarre" and "multi-billion-dollar-industry" are well known not to be mutually exclusive.

>Most of the GUIs are Web pages, though, so you could just interact directly with an HTTP server and not actually render the screen.

The end goal isn't just web pages (And i wouldn't say most GUIs are web pages). Ideally, you'd also want this to be able to navigate say photoshop or any other application. And the easier your method can switch between platforms and operating systems the better

We've already built computer use around GUIs so it's just much easier to center LLMs around them too. Text is an option for the command line or the web but this isn't an easy option for the vast majority of desktop applications, nevermind mobile.

It's the same reason general purpose robots are being built into a human form factor. The human form isn't particularly special and forcing a machine to it has its own challenges but our world and environment has been built around it and trying to build a hundred different specialized form factors is a lot more daunting.

You are not familiar with this market. The goal of a UI Path is to replicate what a human does and being able to get it to production without the help of any IT/Engineering teams.

Most GUIs are in fact not web pages, that's a relatively newer development in the Enterprise side. So while some of them may be a web page, the goal is to be able to touch everything a user is doing in the workflow which very likely includes local apps.

This iteration from Anthropic is still engineering focused but you can see the future of this kind of tooling bypassing engineering/it teams entirely.