How does the computer use work -- Is this a desktop app they are providing that can do actions on your computer? Didn't see any such mention in the post

It’s a sandbox compute environment, using Gvisor or Firecracker or similar, which exposes a browser environment to the LLM.

modal.com’s modal.Sandbox can be the compute layer for this. It uses Gvisor under the hood.

Is there any Python/Node.js library to easily spawn secure isolated compute environments, possibly using gvisor or firecracker under the hood?

This could be useful to build a self-hosted "Computer use" using Ollama and a multimodal model.

I have been [working on one](https://github.com/EtiennePerot/safe-code-execution)! The library is in [src/safecode/sandbox.py](https://github.com/EtiennePerot/safe-code-execution/blob/mas...).

See https://github.com/OpenAdaptAI/OpenAdapt for an open source alternative that includes a desktop app.

It is a docker container providing a remote desktop you can see; they strongly recomend you also run it inside a VM.