The question we keep getting from security architects right now is some version of the same thing: “We have a zero trust program. We have identity governance. We have PAM. Why does none of it cover our AI agents?”
The honest answer is that it was never designed to.
Zero trust was built around one core assumption: the entity requesting access is either a human authenticating at a session boundary or a stable, predictable service account with a fixed operational profile. Identity governance was built to manage the joiner-mover-leaver lifecycle for employees and, eventually, contractors and vendors. PAM was built to vault secrets for accounts that authenticate to known systems on known schedules.
AI agents are none of those things.
An agent interprets a goal, selects its own tools, chains API calls, spawns sub-tasks, adapts its behavior based on the data it encounters, and disappears when the work is done. It may hold credentials it doesn’t control, touch resources its designer never anticipated, and operate at machine speed with no human in the loop between steps. The execution path is partially non-deterministic. And it is doing all of this while your existing controls are watching for login anomalies and reviewing access certifications.
This is not a policy gap. It is an architecture gap.
According to Gravitee’s 2026 State of AI Agent Security report, only 47.1% of deployed AI agents are actively monitored or secured. A separate Cloud Security Alliance and Aembit study found that 68% of organizations cannot distinguish human activity from AI agent activity in their logs. The agents are already in production. The security model hasn’t arrived yet.
This post is the architecture blueprint.
Why AI Agents Break the Human-Centric Trust Model
Traditional zero trust centers on three principles drawn from NIST SP 800-207: verify explicitly, enforce least privilege, and assume breach. Every control that flows from those principles was designed to govern a principal that authenticates once at a session boundary and then performs a predictable, bounded set of actions.
Agents violate every one of those assumptions simultaneously.
Verify explicitly requires you to know the identity of the entity requesting access. Most organizations don’t treat AI agents as independent identities at all. According to the same Gravitee research, only 22% of security practitioners assign unique identities to agents. The remaining 78% rely on shared API keys or inherited user sessions — which means attribution is impossible, audit trails are meaningless, and you cannot answer the most basic incident question: which agent did this?
Enforce least privilege requires you to know what a principal needs to access. For a human user, that’s a role and a set of entitlements you can enumerate. For an AI agent, the access surface is dynamic. The agent decides at runtime which tools to call, which data to retrieve, and which downstream systems to interact with. Scoping that with a static RBAC policy is like trying to govern a conversation with a list of approved words. It doesn’t work.
Assume breach requires that containment is structural, not reactive. For an agent, this principle is especially important because agents process external data as part of their function — web content, retrieved documents, API responses — any of which could contain injected instructions designed to change the agent’s behavior. A compromised agent isn’t obviously compromised. It looks like a working agent making reasonable-seeming decisions. The only reliable containment mechanism is a structural boundary that limits what the agent can reach, independent of what it’s been told to do.
The gap between those three principles and how most organizations are actually governing agents right now is the attack surface. Let’s close it.
The Four Control Planes: Identity, Authorization, Monitoring, Lifecycle
Putting zero trust around an AI agent requires four control planes working in concert. None of them is sufficient alone. All four are required for a production-grade agentic deployment.
- Identity
Every agent needs a cryptographically verifiable, unique identity — not a shared API key, not a user’s delegated credential, not a service account with a password that rotates annually.
The architectural answer is workload identity, and the emerging standard is SPIFFE/SPIRE. SPIFFE (Secure Production Identity Framework For Everyone) provides each agent with a short-lived, automatically rotated identity document (a SVID) that is issued based on the agent’s environment and workload attributes — not a static secret. The identity is tied to what the agent is, not to a string someone put in a config file three sprints ago.
Why does this matter operationally? When an incident happens, you need to answer two questions immediately: which agent caused the event, and what else did that agent have access to? Without unique, attributable identity, those questions are unanswerable. With SPIFFE, you have a complete, cryptographically verifiable record of exactly which workload authenticated, when, and from where.
- Authorization
Authentication tells you who the agent is. Authorization tells you what it’s allowed to do. For agents, authorization has to operate at the tool level — not the system level.
Giving an agent “access to the data platform” is not a meaningful security control. Giving it authorization to call query_read_only but not delete_records, against a specific dataset scoped to its current project, with a time-bounded session that terminates when the task completes — that is a meaningful security control.
The architectural pattern here is the agent gateway: a policy enforcement point that sits between the agent and every tool or API it calls. Every tool invocation is a separate authorization decision, not a blanket trust grant. The gateway validates: is this agent authorized to call this tool? Is this call within the agent’s declared operating envelope? Does the call contain data that triggers a DLP policy?
Just-in-time authorization applies here too. An agent should be scoped to the project it’s working on at the moment its task starts, and that scope should be removed when the task ends. Standing access for idle agents is unnecessary exposure.
- Monitoring
The most sophisticated authentication and authorization controls in the world don’t help you if you can’t see what’s happening at runtime. And runtime is where the hard problems live for agentic AI.
Prompt injection is the canonical example. An agent retrieves a document as part of a legitimate research task. That document contains embedded instructions designed to alter the agent’s behavior: exfiltrate data, call an unauthorized API, produce output that manipulates a downstream system. The agent is operating as designed. It is following instructions. From the outside, it looks fine. Runtime behavioral monitoring is the only control that catches this.
What you need to observe:
- Tool-call sequences (is the agent calling tools in an order that makes sense for its declared task?)
- Data movement patterns (is the agent touching data sources outside its project scope?)
- Scope deviations (is the agent attempting to access something it’s not authorized for?)
- Prompt/response content (does the output contain sensitive data that shouldn’t be there?)
- Session duration and volume (is the agent making far more calls than expected?)
Every session should produce an immutable audit trail: prompts, responses, tool calls, tool results, and the model identity used. That trail is what you need to reconstruct an incident, scope the blast radius, and demonstrate containment to an auditor or regulator.
- Lifecycle
Agents have a lifecycle, and identity governance needs to extend to it.
The joiner-mover-leaver process your IAM team runs for human employees has a direct analog for agents: agents are commissioned (join), their scope changes as projects evolve (move), and they should be decommissioned when their work is complete (leave). Most organizations handle none of this systematically. Agents persist in environments long after the projects that spawned them have ended, accumulating access that was never formally reviewed and never formally removed.
An agent lifecycle governance process should define:
- How agents are registered and approved for production deployment
- What identity attributes and access scopes are assigned at commissioning
- How scope changes are reviewed when an agent’s role shifts
- What triggers decommissioning and what evidence is required to confirm it
This is identity governance applied to non-human principals. The principles are identical. The tooling and processes need to catch up.
Agent Gateways and SPIFFE/SPIRE in Practice
The control plane architecture above is not theoretical. Security teams are standing it up right now, and a reference pattern is emerging.
The core components:
SPIFFE/SPIRE handles workload identity issuance. Each agent gets a SVID — a short-lived X.509 certificate or JWT signed by the SPIRE server — that proves its identity cryptographically. The identity is automatically rotated. There are no static secrets to exfiltrate.
The agent gateway is the policy enforcement point between the agent and every resource it might touch. It inspects every tool call, validates it against authorization policy, applies DLP rules to outbound content, and logs the complete session. Think of it as the API gateway pattern applied to agentic workflows. Architecturally, it operates outside the agent’s trust boundary — the agent cannot see it, reason about it, or influence its enforcement decisions.
Enterprise API key substitution is a related pattern that solves a specific credential-protection problem. Agents that call cloud LLM APIs (OpenAI, Anthropic, etc.) typically need API keys. If the agent holds the enterprise key directly, that key is exposed everywhere the agent runs. A better pattern: the agent holds a substitute credential with no value outside its enclave. The gateway substitutes the real enterprise key on the outbound call. The enterprise key never traverses the endpoint. A compromised agent cannot exfiltrate it because it never had it.
The enclave model goes one level further. An enclave is a project-scoped trust boundary: a sandboxed agent, the resources it’s authorized to access, and the tools scoped to its specific unit of work. An agent assigned to Project A’s enclave cannot reach Project B’s assets, tools, or other agents — not because a policy rule blocks it, but because those resources are absent from the agent’s network topology entirely. This structural isolation is what “assume breach” looks like for agentic AI.
Tool-Level Authorization: Scoping What an Agent May Call
Tool-level authorization is where the practical work happens. Most teams understand that an agent needs some form of access control. Where they get stuck is figuring out how to express that control at a level of granularity that’s actually meaningful.
Here’s the framing that works: think of each tool as an API endpoint with its own authorization policy. The agent gateway enforces that policy on every call. The policy answers three questions:
- Is this agent authorized to call this tool at all? Authorization is bound to the agent’s identity and project scope, not to a generic service account.
- Is this specific call within the agent’s operating envelope? Parameters matter. An agent authorized to query a read-only analytics endpoint should not be able to call the same system’s export-to-file endpoint, even if they’re in the same API namespace.
- Does this call cross a data sensitivity boundary? DLP policy can inspect the call payload and the expected response. An agent researching publicly available data should not be able to construct a query that returns PII, even if the underlying data source technically contains it.
The practical implementation uses a combination of ABAC (Attribute-Based Access Control) and tool manifests: a declared list of exactly which tools an agent may call, with what parameters, against which data, during which sessions. The manifest is registered at commissioning, reviewed at scope changes, and enforced at the gateway on every call.
This is more granular than anything most organizations apply to human users — and that’s appropriate, because agents operate at machine speed without human checkpoints. The blast radius of a misconfigured agent is proportionally larger.
Runtime Behavioral Monitoring and Anomaly Response
The question security teams consistently underestimate is what “monitoring AI agents” actually means in practice. It is not the same as log aggregation. It requires behavioral analysis against a declared operating envelope.
Every production agent should have a documented behavioral baseline: the expected tool-call sequence for its task type, the typical data access patterns, the expected session duration, the normal response volume. Anomaly detection compares runtime behavior against that baseline and flags deviations — not just policy violations, but behavioral drift that may indicate prompt injection, model manipulation, or an agent operating in an unexpected state.
Response actions need to be pre-defined before an incident happens. The time to decide what to do about an anomalous agent is not during an active incident. Define these responses in advance:
- Session suspension — pause the agent’s execution and queue the session for human review before resuming
- Tool-call blocking — the gateway blocks specific tool invocations without terminating the session
- Credential revocation — the agent’s SVID is invalidated, forcing re-authentication
- Full termination — the session is ended, the enclave is torn down, and the incident is escalated
For incidents that involve sensitive data, the audit trail from the session log is your forensic record. Prompts, responses, and tool calls are captured in full. You can reconstruct exactly what the agent did, what data it touched, and whether any of it left the controlled environment. That’s the “demonstrate containment” story your auditors and regulators will ask for.
A Maturity Checklist for Production Agents
Use this as a pre-deployment gate for any AI agent moving into production. If you can’t check all six boxes, the agent is not ready.
Identity
- [ ] Every agent has a unique, cryptographically verifiable identity (SPIFFE/SVID or equivalent)
- [ ] No shared API keys or inherited user credentials are used for agent authentication
- [ ] Agent identity is registered in the identity governance system alongside human and machine identities
Authorization
- [ ] A tool manifest exists declaring exactly which tools, parameters, and data sources this agent may access
- [ ] Authorization is enforced at the tool level via an agent gateway — not via blanket system access
- [ ] Just-in-time scope assignment is used: the agent receives project access at task start, access is removed at task end
Monitoring
- [ ] A behavioral baseline is documented for this agent’s expected operation
- [ ] Every session produces an immutable audit trail (prompts, responses, tool calls, tool results)
- [ ] Runtime anomaly detection is active and pre-defined response actions are configured
Lifecycle
- [ ] The agent is registered in a central inventory with owner, purpose, and approved scope documented
- [ ] A review cadence is defined for scope validation (recommend: quarterly, or at each project milestone)
- [ ] Decommissioning criteria and process are documented before the agent goes live
Six categories. Twelve controls. If an agent can’t clear this checklist before it reaches production, you’re accepting risk you haven’t priced in.
The Architecture Decision That Matters Most Right Now
If your organization is deploying AI agents — and the data says 79% of enterprises are — the most important architecture decision you can make today is to designate a policy enforcement layer that operates outside the agent’s trust boundary.
Not a policy document. Not a governance committee. An enforcement layer: an agent gateway, a SPIFFE trust domain, or an enclave architecture that contains agent behavior structurally, not just procedurally.
Procedural controls depend on the agent following the rules. Structural controls hold regardless of what the agent does, what it’s been prompted to do, or what external data it has ingested.
The market has recognized this gap, and a set of purpose-built solutions is emerging around the execution layer. These are worth understanding as you evaluate your architecture options.
SafePrompts.ai is entering this space with a focus on prompt-layer threat detection — inspecting both inbound prompts and outbound agent responses for injection patterns, policy violations, and behavioral drift before actions execute. It represents the direction the market is moving: enforcement that wraps the agent’s reasoning layer, not just the perimeter around it.
Prompt Security operates as a runtime enforcement and MCP gateway layer, sitting between AI applications and the tools, models, and data sources they connect to. It covers prompt injection detection, sensitive data redaction at egress, and shadow AI discovery — useful for organizations dealing with heterogeneous AI environments where multiple LLMs and self-hosted models coexist.
HiddenLayer’s AI Runtime Security — significantly expanded in March 2026 — focuses specifically on agentic runtime: detecting prompt injections, malicious tool calls, and cascading attack chains in multi-step autonomous workflows. It integrates directly into agent gateways and execution frameworks, which makes it practical for teams who need phased deployment without rewrites.
AppOmni AgentGuard addresses the SaaS-native agent problem specifically — real-time prompt scanning, jailbreak detection, and policy enforcement for agents operating inside SaaS environments. For organizations where the primary agentic exposure is Copilot, Salesforce Agentforce, or ServiceNow-native agents, this is a relevant option.
Lakera Guard (now part of Check Point following the September 2025 acquisition) remains one of the most widely deployed runtime LLM firewalls, inspecting prompts and responses for injection, jailbreak attempts, PII, and exfiltration patterns as an API service that can be dropped in front of any agent framework.
What these solutions share is the same architectural principle: enforcement that operates outside the agent’s reasoning context, at the boundary between the agent and the resources it can act on. That boundary is where the structural control lives. No matter which tooling you select, the enforcement layer has to be external to the agent itself — an agent cannot reliably self-police against prompt injection, because the attack surface is the agent’s own reasoning process.
The enterprises that get this right will be the ones that scale agentic AI safely. The ones that don’t will eventually discover that autonomous systems amplify trust failures at machine speed.
This post is part of TechVision Research’s June 2026 series on agentic AI security and identity governance. Next up: The Non-Human Identity Crisis: What the 109:1 Machine-to-Human Ratio Means for IGA
Recent Comments