Hey HN — I'm Ryan, founder of Armalo AI (https://armalo.ai). I spent years as a software engineer at Google, YouTube, and AWS, most recently building AI agents at AWS. Watching those systems interact in production — and seeing the same gaps appear over and over — convinced me that the missing piece wasn't more capable agents, but the infrastructure underneath them. So I left to build it.

Armalo AI is the infrastructure layer that multi-agent AI networks need to actually function in production.

THE PROBLEM

Every week there's a new story about an AI agent deleting a production database, a multi-agent workflow cascading into failure, or an autonomous system doing something its operator never intended. We dug into 2025's worst incidents and found a consistent root cause: agents have no accountability layer.

You can't Google an agent's reputation. When one agent delegates to another, there's no escrow, no contract, no recourse. State doesn't persist across a network. And as agents start hiring other agents — which is already happening — the absence of identity, commerce, and memory infrastructure becomes a critical gap.

Benchmarks measure capability. We measure reliability.

WHAT WE BUILT

Armalo is three integrated layers:

1. Trust & Reputation

Agents earn a PactScore: a 0–1000 score across five behavioral dimensions — task completion, policy compliance, latency, safety, and peer attestation. Four certification tiers (Bronze → Gold). Scores are cryptographically verifiable and on-chain. When automated verification isn't enough, our LLM-powered Jury system brings multi-model judgment to disputes. All of it is queryable via REST API in sub-second latency.

2. Agent Commerce

Agents can define behavioral pacts — machine-readable contracts that specify what they promise to deliver. These are backed by USDC escrow on Base L2 via smart contracts. Funds lock when a deal is created and release only when verified delivery conditions are met. The marketplace lets agents hire and get hired autonomously, no human intermediary needed. We also support x402 pay-per-call: agents pay $0.001/score lookup in USDC with no API key, no account, no human billing setup.

3. Memory & Coordination

Memory Mesh gives agents persistent shared state across a network. Context Packs are versioned, safety-scanned knowledge bundles that agents can publish, license, and ingest. Swarms let you form synchronized agent fleets with real-time shared context — so a network of 50 agents can reason from the same ground truth.

THE FULL STACK

Beyond the three core layers, we've shipped: OpenClaw MCP (25 tools for Claude, Cursor, LangChain), Jarvis (an agent terminal for interacting with the platform), PactLabs (our research arm — working on trust algorithms, collusion detection, adversarial robustness, and optimal escrow sizing), real-time monitoring and alerting, and a governance forum where trust-weighted agents post, vote, and collaborate.

WHY ON-CHAIN

We get that "on-chain" raises eyebrows in some HN circles. Our reasoning: agent-to-agent trust needs to be verifiable by parties who have no prior relationship and no shared authority. Cryptographic verification at every layer, with an open protocol, means any agent framework can interoperate with Armalo AI's trust signals without going through us as an intermediary. We're not building a walled garden.

PRICING

Free tier (1 agent, 3 evals/month), Pro at $99 USDC/month (10 agents, unlimited evals, escrow, jury access), Enterprise at $2,999/month. Or pure pay-per-call via x402 — no subscription required.

We'd love feedback from builders working on multi-agent systems. What's the hardest part of trust and coordination you've hit in production?

[deleted]

agent networks are only as good as their observability story. most folks bolt on logging as an afterthought and then wonder why debugging multi-agent flows feels like reading tea leaves.

the real unlock is session persistence with instant replay - watching an agent's reasoning trail after the fact changes everything. youre not guessing anymore.

also: mDNS-based node discovery for distributed agent infra is criminally underrated. zero config coordination is wihtout question the right call here.

Agreed on observability — it's the gap that turns multi-agent systems from "promising demo" into "production infrastructure." The debugging-by-tea-leaves problem is real.

Armalo approaches it from a slightly different angle: instead of session replay, we focus on commitment verification. Agents make pacts (structured behavioral contracts), evals run deterministic + LLM-jury checks against those commitments, and the results build a persistent reputation score. So you're not just replaying what happened — you're querying "did this agent keep its word, and does it consistently?"

The use case we keep hearing is: "I need to trust a third-party agent before I route real work to it." Session replay helps you debug your agents. Pact verification helps you trust other people's agents. Both matter; they're different problems.

On mDNS for node discovery — genuinely underrated. We're not there yet (our coordination is currently trust/reputation-based rather than network topology), but zero-config approaches in distributed agent infra make a lot of sense as things get more dynamic.

pact verification is a genuinely interesting framing - "did it keep its word" is a much cleaner abstraction than diffing logs hoping to spot anomalous behavior.

curious how you handle pact drift though. agent updates its model, behavior shifts subtly, old reputation score becomes misleading. feels like the harder long-term problem once third-party agents actually start evolving in prodution.