> define real consistent deterministic gates and protocols

I've been experimenting with doing kinda exactly that with the "routing layer" / "harness" level of things, before the "main" LLM itself ever receives the user's input, by getting "user intent" (as a little JSON packet) really quickly from an ultra-lightweight model first and deciding from there in deterministic code what "context" to inject into the user message template, which system prompt to use, and which model to route the assembled context "packet" to for the final response. These LLMs really are fun to play with once you get a feel for which ones do what well, and where each falls short so you can use them each around their individual strengths. :)