I built a very similar server myself [0] with a similar setup. I run different models for different purposes, but the primary one currently is kimi 2.6. I run kimi as the orchestrator model and then qwen, Gemma and others for specific tasks (sometimes loaded dynamically based on the task at hand), all exposed through the pi harness. I also use Hermes for some personal repeated tasks which connects to the same models, hosted on my local Mac Studio.
I am not even going to pretend that this is financially reasonable option. I simply wanted to have a local models. Maybe down the line, as cloud models become less subsidized, I might benefit from having a local setup, but for now, it wasn't the most prudent financial decision.
But one big benefit is that I never have worry about my account being randomly banned nor I have to worry about running out of quota. I still use codex and opus for some specific tasks, but as tools are improving, I need them less and less.