Agreed, cross-node is the hard next step. For now single-node density gets you surprisingly far. 1000 concurrent sandboxes on one $50 box. When we need multi-node, userfaultfd with remote page fetch is the likely path.
Agreed, cross-node is the hard next step. For now single-node density gets you surprisingly far. 1000 concurrent sandboxes on one $50 box. When we need multi-node, userfaultfd with remote page fetch is the likely path.
Cool project. +1 on userfaultfd for the multi-node path. Wrote about how uffd-based on-demand restore works wrt to my Cloud Hypervisor change [1] if you are curious.
I think the the main things to watch are fault storms at resume (all vCPUs hitting missing pages at once) and handler throughput if you're serving pages over the network instead of local mmap. I think its less likely to happen when you fork a brand new VM vs say a VM that has been doing things for 5 mins.
Also interestingly, Cloud Hypervisor couldn't use MAP_PRIVATE for this because it breaks VFIO/vhost-user bindings. Firecracker's simpler device model is nice for cases like this.
[1] https://www.shayon.dev/post/2026/65/linux-page-faults-mmap-a...