Is nested VMX virtualization in the Linux kernel really that stable?

The technical details are a lot more complex than most realize.

Single level VMX virtualization is relatively straightforward even if there are a lot of details to juggle with VMCS setup and handing exits.

Nested virtualization is a whole another animal as one now also has to handle not just the levels but many things the hardware normally does, plus juggling internal state during transitions between levels.

The LKML is filled with discussions and debates where very sharp contributors are trying to make sense of how it would work.

Amazon turning the feature on is one thing. It working 100% perfectly is quite another…

Fair concern, but this has been quietly production-stable on GCP and Azure since 2017 — that's 8+ years at cloud scale. The LKML debates you're referencing are mostly about edge cases in exotic VMX features (nested APIC virtualization, SGX passthrough), not the core nesting path that workloads like Firecracker and Kata actually exercise.

The more interesting signal is that AWS is restricting this to 8th-gen Intel instances only (c8i/m8i/r8i). They're likely leveraging specific microarchitectural improvements in those chips for VMCS shadowing — picking the hardware generation where they can guarantee their reliability bar rather than enabling it broadly and dealing with errata on older silicon. That's actually the careful engineering approach you'd want from a cloud provider.

It's been around for almost 15 years and stable enough for several providers to roll it out in production the past 10 years (GCP and Azure in 2017).

AWS is just late to the game because they've rolled so much of their own stack instead of adapting open source solutions and contributing back to them.