I’ve been working on a small system to understand how applications can stay up even when backends fail.
The idea is simple: instead of sending requests to a single backend, route them through a layer that can switch to another backend if something goes wrong.
It:
checks backend health (latency, errors) avoids unhealthy servers retries requests on another backend if needed
It’s designed as:
a fast routing layer (Rust) a simple control API (Python) shared state via Redis
One thing I found interesting is that failover only works before the response starts — after that, switching isn’t possible.
Still early and mostly an experiment to understand failover and reliability better. This begun as internal experiment, after recent region outages.
Curious how others approach this problem in production systems.
[dead]