There are so many proxies like this now but I can tell you from first hand experience this is not going to work. You cannot just route away from a situation at such a high level especially when we are talking about models that are quite different in behaviour, with different context windows and tuned to different tool uses. The harness is doing all kind of funky things to compensate for issues (like tool call truncation) that a proxy that routes dynamically like this will work against the very same strategies that make the harness work.
Interesting concept, work in theory, but I cannot see this being part of larger system.
This is not choosing between different models, really. You should check the (interesting, yet sadly very slop-padded) readme. It’s about trying to make a binary decision: Is this a hard or easy question, and about making that decision extremely fast. They suggest putting another router that chooses the model behind it. I’m not sure how well it would work, but the idea is interesting and different than other routers.