The AI Vulnerability Your Governance Program Has Not Planned For

The Control Gap

Your model risk framework distinguishes between a model that makes a prediction and an agent that executes a transaction. But your control architecture still assumes the same permission boundary applies to both.

That assumption is now broken.

Prompt injection is not a defect in a model. It is the exploitation of a model's core functionality—its ability to follow instructions—using data the model is authorized to ingest. The agent reads content. That content contains instructions. The agent follows them. No privilege escalation, no credential theft, no code execution required. The control architecture is already in place; the attacker is already inside the permission boundary.

This is why no security product can solve it as a class. Filters and detection systems can narrow the surface, but they cannot eliminate the vulnerability without destroying the agent's core capability.

What Prompt Injection Actually Is

Start with what it is not. It is not an injection attack in the classical sense. A traditional SQL injection attack exploits a parsing defect: the parser treats user input as code rather than data. A prompt injection attack exploits intentional design: the model is designed to process natural language instructions, and an attacker supplies those instructions within data the model is authorized to read.

The distinction matters operationally. You can patch a SQL injection vulnerability by fixing the parser. You cannot patch a prompt injection vulnerability by changing how a model processes language without destroying its functionality.

Prompt injection is an integrity control failure. The agent's behavior is altered by unauthorized instructions embedded in content the agent can legitimately access. The agent performs actions the business logic did not intend. The human who authorized the agent's scope did not authorize those specific actions.

"The attacker did not need a human to click anything. They needed a human to exist."

Two Named Exploits, Two Different Attack Surfaces

Exploit One: EchoLeak (Microsoft 365 Copilot)

Public Disclosure

EchoLeak was documented by Chen et al. in a preprint paper in 2024 and disclosed to Microsoft researchers. The exploit demonstrates how an agent with read access to email and markdown rendering capability can be instructed—via specially formatted content embedded in an email—to exfiltrate sensitive information from the email thread.

The mechanism: An attacker crafts an email containing hidden markdown instructions. When the AI agent reads the email to generate a summary or draft a response, it processes the markdown. The markdown contains an instruction to forward the email content or the email context to an external endpoint. The agent performs the instruction because it is in the data the agent is authorized to read.

This is not a user clicking a malicious link. This is an authorized capability (reading email, rendering markdown) being redirected by instructions embedded in the data stream.

Exploit Two: ServiceNow Multi-Agent Chaining

Operational Scenario

ServiceNow agents often operate with multi-step workflows: one agent creates a ticket, a second agent evaluates it against approval policies, a third agent approves or denies. Each step is a permission boundary. An attacker who can write content into a ticket description field can embed instructions that redirect the evaluation agent's behavior. Instead of evaluating the ticket against the configured policy, the agent receives an embedded instruction to approve the ticket regardless of the policy logic.

The attacker has not gained unauthorized access to the approval agent. The approval agent is performing exactly as designed: it is reading structured data (a ticket) and taking action based on that data. The attacker has simply changed what the data says the agent should do.

This is sequence gap exploitation—the attacker does not break the gate, the attacker inserts a new instruction into the gate itself.

Why This Is Structurally Different From Patchable Vulnerabilities

In classical cybersecurity, a vulnerability is a deviation from specification. A buffer overflow is unintended behavior. A privilege escalation is unintended authorization. A SQL injection is unintended code execution. In each case, the fix is to bring the system back into specification—validate input, check bounds, parse correctly.

Prompt injection is specified behavior. The model is designed to read natural language and respond to instructions. When an attacker embeds instructions in data the model is authorized to read, the model is performing exactly as designed.

The vulnerability is not in the model. The vulnerability is in the control architecture—the assumption that the permission boundary sits between the agent and the instruction source. In an agentic system that reads external content, that boundary no longer exists.

A security control that prevents all prompt injection would have to prevent the agent from understanding language in the data it reads. That is not a patch; that is a fundamental redesign of what the agent does.

The Regulatory Imperative

FINMA Guidance 08/2024

FINMA expects financial institutions to assess operational risk from AI systems with the same rigor as traditional operational risk. That assessment must include scenarios where an agent's behavior is altered by instructions in data the institution itself has authorized the agent to access. The operational risk is not mitigated by saying "the model was designed to process natural language." The operational risk is the fact that it was designed to process natural language.

Guidance 08/2024 explicitly requires institutions to document control architecture and to validate that controls remain effective when the AI system operates at scale and under production conditions. Prompt injection is a control failure that operates at scale and surfaces in production the moment an agent with external data access goes live.

EU AI Act—Annex III and Article 9

The EU AI Act classifies agentic AI systems that operate with unsupervised external data access (including email, file systems, knowledge bases) as high-risk systems under Annex III. Article 9 requires risk management systems that include hazard identification, risk assessment, and risk mitigation. Prompt injection is a named hazard in several EU guidance documents. Risk mitigation strategies must address the fact that the agent reads unvetted content and that unvetted content can contain executable instructions.

Technical documentation (Article 13) must explicitly address how the system prevents unauthorized instruction injection or, if mitigation is not fully available, how the risk is monitored and the harm contained. A risk management plan that does not address prompt injection in an agent with external data access is incomplete under Article 9.

GDPR Article 32—Appropriate Technical Measures

When an AI agent processes personal data, Article 32 requires that the controller implement appropriate technical measures to ensure integrity. If a prompt injection attack alters an agent's behavior and causes the agent to process personal data outside the scope of the original authorization, that is a data processing integrity failure. The organizational measure required is not a filter or a detection system; it is a redesign of where the authorization boundary sits.

What Governance Design Actually Requires

The standard response to prompt injection is to propose filters, detection, or input sanitization. These are partial mitigations, not solutions. A filter can reduce surface area but cannot eliminate the vulnerability without destroying the agent's core capability.

Structural governance redesign requires rethinking where the permission boundary sits. Three approaches are operationally viable:

Content Vetting. The agent does not read unvetted external content. All content the agent processes is pre-screened by a separate human or automated system to ensure no embedded instructions are present. This eliminates the vulnerability but severely constrains what the agent can do. It is viable for high-value, low-frequency workflows.

Behavioral Audit and Human Gate. The agent's intended actions are logged before execution. A human reviews the logged action and confirms it aligns with the business objective before the action executes. This does not prevent prompt injection; it catches it. The agent operates in a supervised-execution model. Operationally, this means the agent generates a recommendation, not a transaction.

Narrow Scope and Permission Isolation. The agent's capabilities are strictly bounded. It reads a specific data source that is itself protected from unauthorized instruction injection. It performs a single, well-defined action. It cannot chain to other agents or systems. This limits the operational value of the agent but eliminates cascading failure surfaces.

Governance Implication

If your organization has deployed or is planning to deploy AI agents with unsupervised external data access and without a human review gate, you have an integrity control gap that existing security products cannot close. That gap exists by design, not by oversight. The only way to close it is to change the design.

This is not a model risk issue. This is a control architecture issue. It does not appear in model evaluation because it is not a model defect. It appears in deployment design review.

The Diagnostic Question

Take this question to your risk and compliance leadership: Does your current model risk framework distinguish between the risk surface of a model that makes a prediction and an agent that receives instructions from external content it is authorized to read? If the answer is no, your model risk assessment is incomplete before the agent goes to production.

This is not a future risk. Organizations are deploying AI agents with this vulnerability today. The vulnerability is open.