Building an entirely new world for agents to compute in is far more difficult than building an agent that can operate in a human world. However i'm sure over time people will start building bridges to make it easier/cheaper for agents to operate in their own native environment.

It's like another digital transformation. Paper lasted for years before everything was digitalized. Human interfaces will last for years before the conversational transformation is complete.

I am just a dilettante, but I imagined that eventually agents will be making API calls directly via browser extension, or headless browser.

I assumed everyone making these UI agents will create a library of each URL's API specification, trained by users.

Does that seem workable?