For high-availability deployments, we leverage Fly.io's global Anycast network and DNS-based health checks. When a machine in region A goes offline, Fly's Anycast routing automatically directs traffic to healthy machines in other regions without manual intervention.
For intra-region redundancy, we deploy 2 nodes per region in HA mode. If one node fails, traffic is seamlessly routed to the other node in the same region through Fly.io's internal load balancing. This provides N+1 redundancy within each region, ensuring service continuity even during single-node failures.
I recommend adding more details like this to the website. Knowing it's Fly.io under the hood gives me significantly more confidence in your service.
Updated the site, we'll add more about it shortly.
How much of a difference would automated health checks+programatic dns updates make vs any cast
Depends on the setup and what your goals are. Anycast typically takes the shortest route based on topology. This is particularly nice when you use something like caddy (because of the huge plugin system, you can do lots of stuff directly on the edge) to build your own CDN by caching at the edge or go all in and use caddy-lua to build apps at the edge. Gluing together dns systems (health checks, proximity + edge nodes) can be similar but the benefits of being "edge" largely go away as soon as you add the extra hop to a different region server.