The Observation
Most AI stacks look like this:
app → model → output (+ logging, monitoring, maybe guardrails)
You can log what the model produced. You can filter outputs. You can track usage.
But you usually cannot answer:
For this specific AI decision, under what versioned policy was it allowed to execute?
Authorization tends to be system-level (“we approved the deployment”), not decision-level.
That distinction seems minor until AI systems start: • Calling tools • Triggering workflows • Accessing internal APIs • Modifying state
At that point, they’re no longer just generating text. They’re performing actions.
⸻
The Experiment
I built a thin control layer that sits between the app and the model:
app → authorization layer → model
Every request goes through OPA (Open Policy Agent) before execution.
The policy engine returns one of: • REUSE (return previously authorized result) • COMPUTE (allow model execution) • ADAPT (modify response under policy) • ESCALATE (require human review)
Only after that decision is the model allowed to run.
⸻
What Gets Stored
For each request, the system records: • Policy ID and version • Policy bundle digest (SHA-256) • Risk domain + score • Authorization decision • Immutable ledger transaction ID
Traces are replayable. The model itself is unaware this layer exists.
Stack: • FastAPI • OPA • immudb (append-only ledger) • Dockerized deployment
⸻
What I’m Trying to Validate
Is per-decision authorization overengineering?
In practice, teams rely on: • Prompt constraints • Output filters • Access controls • Human review workflows
Maybe that’s sufficient.
But if AI systems become more agentic and start executing tool calls, I’m not sure post-hoc logging is enough.
Cloud systems evolved IAM because “we trust the service” wasn’t sufficient.
Does AI need something similar?
⸻
Open Questions • Do you see a real need for runtime authorization before LLM execution? • Is policy-based decision gating useful beyond regulated industries? • Is immutable traceability necessary, or just logging? • How would this interact with agent tool-calling frameworks?
Genuinely interested in whether this solves a real problem or is just infra over-design.
Happy to share more details if useful.