> Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.
And yet having an agent able yo use a computer on your behalf is really useful.
Recently I gave a Nix OS vm to my hermes agent and it has been a good experience. I don't really care if destroy the machine I can just rollback to an earlier version, and for any meaningful data he creates for me I make sure he creates a repo, commit and pushes to my private Gitea instance.
> And yet having an agent able yo use a computer on your behalf is really useful.
It is, but there's no need for it to be viewing your screen, browsing websites and watching ads.
That stuff is for humans, not for LLMs.
Sure, I don't want an agent watching MY screen. That's why I gave him his own environment, and pretty quickly he discovered that you can open chrome and make it render to a framebuffer, this way he is able to 'view' the website. And apparently with this he is able to bypass a lot of 'anti-bot' measures.
> And yet having an agent able yo use a computer on your behalf is really useful.
I honestly cannot think of a single use case
I think the main advantage is adaptability.
Imagine you have a pretty exotic task you need to complete that involves converting a video file from one format to another.
You can use ChatGPT or something similar and the best you will get is either a script you can run on you machine that does what you need or he may decide to render a new video.
If you have something like OpenwebUI you could configure a MCP that converts videos and allow the model to use this MCP to do your task. This should work, but is quite a lot of work for something you'll ever do once.
But if the agent has it's own environment he can decide to install ffmpg, execute the transformation and serve you the file you want.
In reality there is no new capabilities with this approach, but things get a lot more comfortable.
This doesn't require computer use, just a bash tool (and possibly fetch to get ffmpeg documentation)
Yeah even Claude Cowork would do this, doesn't need "computer use"
Literally everything you do every day.
It's the end game of AI. Have systems trained on doing EVERYTHING you do on a computer all day. Trained by you while doing the job.
I give you one: Google news is pretty terrible right now almost all interesting new sources are paywalls and so I get recommended all kind of weird lifestyle publications that are really horrible. With the computer use API I can just tell. Tell Gemini to look at Google news pick the articles that look interesting. Look them up on archive.is, and just give me the plain text article and construct a summary - I think that would probably work pretty well.
Have you ever done something tedious on a computer?