The Observation

Most AI stacks look like this:

app → model → output (+ logging, monitoring, maybe guardrails)

You can log what the model produced. You can filter outputs. You can track usage.

But you usually cannot answer:

For this specific AI decision, under what versioned policy was it allowed to execute?

Authorization tends to be system-level (“we approved the deployment”), not decision-level.

That distinction seems minor until AI systems start: • Calling tools • Triggering workflows • Accessing internal APIs • Modifying state

At that point, they’re no longer just generating text. They’re performing actions.

The Experiment

I built a thin control layer that sits between the app and the model:

app → authorization layer → model

Every request goes through OPA (Open Policy Agent) before execution.

The policy engine returns one of: • REUSE (return previously authorized result) • COMPUTE (allow model execution) • ADAPT (modify response under policy) • ESCALATE (require human review)

Only after that decision is the model allowed to run.

What Gets Stored

For each request, the system records: • Policy ID and version • Policy bundle digest (SHA-256) • Risk domain + score • Authorization decision • Immutable ledger transaction ID

Traces are replayable. The model itself is unaware this layer exists.

Stack: • FastAPI • OPA • immudb (append-only ledger) • Dockerized deployment

What I’m Trying to Validate

Is per-decision authorization overengineering?

In practice, teams rely on: • Prompt constraints • Output filters • Access controls • Human review workflows

Maybe that’s sufficient.

But if AI systems become more agentic and start executing tool calls, I’m not sure post-hoc logging is enough.

Cloud systems evolved IAM because “we trust the service” wasn’t sufficient.

Does AI need something similar?

Open Questions • Do you see a real need for runtime authorization before LLM execution? • Is policy-based decision gating useful beyond regulated industries? • Is immutable traceability necessary, or just logging? • How would this interact with agent tool-calling frameworks?

Genuinely interested in whether this solves a real problem or is just infra over-design.

Happy to share more details if useful.