At another big-4 hyperscaler, we ended up with substantial downtime and a lossy migration because they didn’t know how to manage kubernetes.
Microk8s doesn’t use etcd (they have their own, simpler thing), which seems like a good tradeoff at single rack scale: https://benbrougher.tech/posts/microk8s-6-months-later/
The article’s deployment has a spare rack in a second DC and they do a monthly cutover to AWS in case the colo provider has a two site issue.
Spending time on that would make me sleep much better than hardening a deployment of etcd running inside a single point of failure.
What other problems do you see with the article? (Their monthly time estimates seem too low to me - they’re all 10x better than I’ve seen for well-run public cloud infrastructure that is comparable to their setup).