There are sooooo many sandbox providers out there.

They do spike on different features like:

    - snapshotting and forking
    - good SSH and VPN access for end-users
    - agent-friendly features, like obscuring secrets at network layer

Then there's also the option to use libkrun to run local sandboxes on your own computer. That doesn't scratch the itch for hosted services, but works if your goal is to run agents inside isolated environments for your own work.

I've been working on some open-core stuff[1] to coordinate sandboxes, and we're making changes to have a library that lets people coordinate any number of remote or local sandboxes using any provider, kinda like how the Docker CLI works for managing containers, git repos, and coding agents. Flue[2] is another player in this space, and is more of a pure framework, while we're building it as an interactive product for using sandboxed agents and workflows.

[1] https://github.com/gofixpoint/amika/blob/main/ROADMAP.md

[2]: https://flueframework.com/

We’re adopting agent-sandbox (https://agent-sandbox.sigs.k8s.io/) as we already run most if our stuff in kubernetes and it’s been working very well, the only downside being it’s a moving target as it’s still essentially in development.

Why isn't libkrun good enough for hosted stuff? I use it as a podman backend in a microservice architecture.

Firecracker has more tooling for the orchestration layer that manages many sandboxes at once. Stuff like K8S integration, an external REST API control plane, more first-class support for snapshotting, etc.

You'd have to build more of that with libkrun

The core tech of both are great though.

Firecracker has more tooling, but setting ist up and managing it is also more complicated, at least for k8s workloads. Libkrun is so easy for k8s! Compile crun with Libkrun support, crate a symlink of crun with the name krun, done. Works like any normal pod. Firecracker with kata-containers is a lot more brittle and complicated. I've invested quite some time getting this running for a talk I'm working on

Is the talk going to be shared online anywhere? Would be interested in checking it out later!

That's super interesting - have you written up anything on this? I'd love to read it.

No, but I can give a small introduction. I installed krun from the arch package repositories: https://man.archlinux.org/man/extra/krun/krun.1.en

Then one can just pass `--runtime krun` to most podman subcommands. Alternatively, set the runtime key in the config file to make it the default.

Podman itself has "hardening" techniques, e.g. turning off the network or volumes that can be combined with this.

libkrun is not production ready compared to Firecracker which the latter is used in 99.9% of many companies.

For what exact reason is it not production ready? Or is that the stance of its maintainers?

Setting up your own is not that hard and if you bought some compute before the Altman squeeze, very cheap.

Def!

My personal belief is that the future of an "app" is a combo:

    1. micro VM
    2. agent on the VM
    3. software bundled into the VM

So, it should be stupid simple to run these local sandboxed apps/agents. Right now, not too hard for technical users (esp. with things like https://smolmachines.com/ and https://microsandbox.dev/), but not as easy as clicking an app icon or typing `/path/to/binary` in the CLI

Microsandbox claims to start faster than docker, and it is isolated from the host, and to work with OCI. Why would I still want to use docker? The only reason I can imagine is that I actually want to be able to dynamically share resources between containers instead of dividing up VMs a priori.

Ah, the significant compute overhead: https://josecastillolema.github.io/podman-wasm-libkrun/. Much more cpu and ram usage at worse performance.

> dynamically share resources

This has been a big pain point me with various VM solutions I’ve tried. Having to allocate say 8GB to a sandbox, and a) having that RAM eaten up when I’m not using it and b) only having 8GB when I am using kinda sucks.

Yes, I could stop the sandboxes when I’m not using them, but that also kinda sucks.

The performance difference in that post is due to wasm, not the container runtime, which is also stated at the end of the post.

Starting faster than a container will need evidence since starting a container on Linux is basically instant.

It is instant for me when using podman but by no means instant when using docker. Docker on Linux native is stay way faster than on macOS and Windows. But so far running with podman has the lowest overhead I have seen.

I was going to add a comment praising smolmachines' smolvms. Simple, fast (sub-200ms cold start), OCI-compat, and has trivial packing to standalone 0-dep executables. No need for Docker Desktop / colima / orbstack. For those who prioritize security, kernel isolation is a meaningful benefit.

No programmable network stack though, so can't pass fake credentials to things inside vm and exchange them on the boundary

heh I vibe-coded a little local app to have smolmachines and tart, for smolmachines i had to vibe-fork 2 deps deep to get GUI support working, but now i have linux desktop computers on smol machines!

also have support for lima/colima/podman

Would like to see this! Wonder how you got GUI support working because with vulkan support - you effectively enable running games cross platform

VNC/RFB as the transport, but not just a guest-side x11vnc. I forked the local SmolVM path to start libkrun with display enabled, expose the framebuffer + keyboard/pointer input, then serve that over a loopback passworded RFB endpoint. Local Machines waits for display_ready and embeds it. It has to be selected at VM start; no hot attach yet.

The interesting bit is the libkrun GPU/framebuffer/input plumbing; VNC is just how I got the pixels into the macOS app. The guest still needs a real graphical workload/compositor, e.g. Weston.

You basically described exe.dev

exe.dev is great, but the VMs are not really "apps". They are durable computers / VMs.

An example of a "sandboxed agent app", would be: give the app all your past emails. An agent scans them and finds sales emails you need to follow up on. It shows you the suggested follow ups in a UI, and you approve/reject them. Then, it mass sends the approved emails and emits an update to your CRM with the changes.

The sandbox is deleted when the app runs. It's ephemeral for the lifecycle of the app. And you can re-run the same app repeatedly with new inputs, but it gets the same clean starting slate.

I am building this.

+1 for microsandbox. I've been using their golang SDK (https://docs.microsandbox.dev/sdk/go/sandbox) @v0.5.10 to create sandboxes, attach them to agent sessions to execute, and then throw away, all in a raspberry pi 5 k3s cluster (as they have ARM support, if you're into that sort of thing). The microsandbox code is still a bit in flux (since it hasn't reached v1.0 API stability yet), but it's definitely worth checking out as it looks to have a solid foundation.

(edit: ahh sorry, meant to post this to above comment)

Yep I've got one I built and it's absolutely fine for my use cases has a web interface/API custom kernels and rootfs, even the facility to set-up custom Kubernetes clusters. It's been really useful for other work like testing out vulnerabilities or security features in isolated envs.

How do you do it?

It probably depends on your use case. I have a nice setup for putting claude code in a sandbox for development, but that's likely quite different from running production workloads for customers at scale.

What people aren't getting with `firecracker` is utilization. Don't get me wrong, `firecracker` is great software and it's what I'm using for lightweight virtualization, but workloads are really bursty over really short periods of time now, even with the snapshot and restore that you can get if you're willing to hack on `firecracker` substantially, you hit walls where it's like, this is too much against the grain, this thing wasn't designed to bounce from 1 core to 32 to 8 to 16 to 4 to 32 to 1 seamlessly, and that's what it takes to get extreme utilization even with extremely good ML on the prediction.

I am quite sure I'm not the only person working on post-firecracker KVM.

I designed my take to basically eliminate the concept of vm being a rigid box of cpu/memory with CPU oversubscription and virtio-ballooning on memory + sparse ext4.

That way it can be elastic in CPU, memory and somewhat disk.

How far are you on your take?

Thanks for sharing these!