I just have a smart model write a testable phased plan, have a cheaper model implement them, and yet another model to review each phase. I don't see the value of adding a Rust state engine. Algorithmically verifiable things can be tests, and more nebulous things (like pattern compliance) need an LLM to do the heavy lifting and can make mistakes, so what does the state engine buy you?
the state engine is the part that can't hallucinate. even with simple steps/prompting the review model can miss things... it's still an LLM making a judgement call at the end of the day.
the state engine doesn't judge, it enforces... with code and not transformers ^_^
if a tool (or any other guardrail) isn't valid at a given state the model call gets rejected before the model sees the result. that's the gap between "a model said this is okay" vs. "the system structurally prevents this"
I don't understand. Let's stay my state is whether we are in conformance with repo patterns. Walk me through how you don't/can't hallucinate, given that you need an LLM to determine the state. For state variables that don't need LLMs, you can simply use tests and commit hooks, no?
the LLM doesn't determine the state... it requests a transition to change the state. the engine evaluates guards (data carried along the way) to decide if the transition is valid.
it (the LLM) can't skip from implementation to deploy if the guard says the tests haven't passed. the model will receive feedback that what it's tried to do is invalid and give the reasons why. it can't be skipped. it then tries to resolve that new information to make the state transition... almost like it would responding to a human in the chair denying a step.
the model can't merge if it hasn't gone through your review state, even if it wants to (it'll try though)