I've been thinking of a similar thing, I just need a local model with consistent tool calling performance.

Most of my crap could just be tools and a mid-level language model interpreting the results and deciding whether to act on them.