Zero-Trust for Agents (Paper): Capability Grants, Tripwires, Immutable Logs

Zero-Trust for Agents (Paper): Capability Grants, Tripwires, Immutable Logs
New preprint (engrXiv DOI): https://doi.org/10.31224/5792

Agentic AI is powerful—and risky—once tools and data are in reach. This preprint lays out a Zero-Trust architecture for AI agents so you can move fast with guardrails: scoped capability grants, runtime tripwires, and immutable audit logs. It maps controls directly to EU AI Act Article 14 (human oversight) and the NIST AI RMF (Govern / Map / Measure / Manage), and includes a practical threat model, a control↔requirement matrix, KPI/SLOs, and a micro-evaluation harness based on OWASP LLM01/LLM06 and Salesforce-style prompt-injection patterns.

What’s the blueprint?

  • Capability grants (least privilege): short-lived, scoped tokens; deny-by-default tools/data; allowlists and ABAC/FGA for precision.
  • Tripwires (runtime control): rules + anomaly detection to gate or block actions; human cosign for sensitive ops; kill-switch with p95 override latency SLO.
  • Immutable logs (accountability): append-only evidence of prompts, tool calls, outputs, overrides; replay/rollback for fast incident recovery.

Why it matters (regulatory fit)

  • EU AI Act, Art. 14: effective human oversight, the ability to interrupt or stop, and documentation of oversight activity.
  • NIST AI RMF: continuous risk measurement and mitigation across the lifecycle.
    This architecture operationalizes both—without slowing delivery.

What’s inside the paper

  • Threat Model (½ page): attacker goals, vectors (LLM01/LLM06), trust boundaries, controls, residual risk.
  • Control↔Requirement Matrix: how capability tokens, tripwires, logs, and overrides satisfy Art. 14 and AI RMF functions.
  • KPI/SLOs: p95 override latency, % actions gated, audit-log completeness, incident MTTR, token hygiene.
  • Micro-evaluation harness: public, reproducible prompts (OWASP + Salesforce-style) to test the control plane’s block rate, FP rate, and latency.

Who this helps

Security architects, platform owners, SRE/ML Ops leads, and compliance/assurance teams who need deployable guardrails for agentic AI—now, not later.


Read the preprint (engrXiv DOI): https://doi.org/10.31224/5792

Kostakis Bouzoukas

Kostakis Bouzoukas

London, UK