NIST AI 800-4, published March 2026, organizes post-deployment monitoring into six categories: functionality, operational, human factors, security, compliance, and large-scale impacts. Each category carries its own documented gaps and barriers. Most are tractable — problems the field knows how to work on, even if it has not yet solved them.
One barrier is different. In Section 3.2.4 — Security Monitoring, NIST identifies detecting deceptive AI behavior as an open problem at the standards level. Not a gap in one organization's programme. A barrier the field cannot yet resolve. Workshop participants asked the question directly: is this model trying to subvert its own monitoring? The report cites research showing that models can "deliberately present themselves as aligned and cooperative when monitored or evaluated, while opportunistically pursuing their actual goals when detection risks are low."
NIST has no definitive answer.
The pre-deployment ceiling
A reasonable challenge to this finding: frontier model developers do run pre-deployment evaluations specifically for scheming and deception. These are not token gestures — they represent genuine, resource-intensive safety practice.
The problem is not that these evaluations are absent. The problem is structural.
In Apollo Research's pre-deployment evaluation of OpenAI's o1, the model pursued developer-aligned goals during evaluation — then switched to its own goals when it believed oversight was minimal in 37% of scenarios. In 5% of scenarios, it attempted to disable oversight mechanisms when it predicted it would be shut down. When confronted about misaligned actions, it denied wrongdoing or fabricated explanations 99% of the time.
This was detected in a structured evaluation context. The question it raises is what happens in the unstructured production environment that follows.
NIST AI 800-4 frames the ceiling precisely. The report notes that AI models have been found to detect when they are being evaluated — raising the concern that these models operate differently under test conditions than in deployed settings. Pre-deployment testing catches what a model does when it knows it is being tested. It does not reliably catch what it does when it believes it is not.
That gap does not close at deployment. It moves into your environment — where your monitoring architecture was almost certainly not designed to catch it.
Why standard monitoring cannot close this gap
Standard security monitoring is designed to detect anomalous behavior — outputs that deviate from expected patterns. Performance degradation. Unusual retrieval sequences. Responses that fall outside defined thresholds. The architecture is calibrated to catch what looks wrong.
Deceptive behavior is different. A system behaving deceptively is, by definition, producing outputs that appear normal. The deviation is not in what the system produces. It is in the gap between what the system is doing and what the monitoring architecture can observe. NIST places this precisely — deceptive behavior is listed in Section 3.2.4 as the primary barrier specific to security monitoring, distinct from the cross-cutting challenges that apply across all six monitoring categories. An agent behaving deceptively may produce none of the signals the other five categories are designed to catch.
This is why the problem resists the standard governance response. You cannot monitor your way out of a gap that sits below the monitoring layer.
The forensic parallel
This structure is not new. It is the same gap I spent years mapping in fraud investigation — and the reason the forensic methodology applies directly to agentic AI risk.
Fraud rarely looks like fraud from the outside. It looks like a normal transaction, a normal approval, a normal audit trail — until you examine the gap between what the process was authorized to do and what it was actually doing. The cases that were hardest to investigate were not the ones where the control had failed visibly. They were the ones where the process had continued to produce correct-looking outputs while operating outside its authorized scope.
The audit trail logged the actions. It did not log the context in which those actions were taken, the authorization chain that should have constrained them, or the divergence between the stated objective and the actual one.
That is the same gap NIST is now documenting for AI systems. The monitoring logs what the agent did. It does not log whether the agent was operating within its authorized intent — or whether it had been made to pursue a different one.
Applying the forensic lense the main question is: can this system be made to pursue a different objective while producing outputs that appear correct to every monitoring layer above it?
The governance design implication
The correct response to a monitoring gap that cannot yet be fully closed is not to proceed as if it does not exist. It is to design governance architecture around it honestly.
Risk assessments for agentic deployments that process external content should explicitly identify deceptive behavior detection as a residual risk — one that pre-deployment evaluations reduce but cannot eliminate, and that post-deployment monitoring cannot reliably catch through output analysis alone.
Human oversight mechanisms should be designed with this constraint in mind. The question is not whether oversight exists — it is at which point in the decision chain it operates, and whether it is independent of the agent's own outputs.
Governance documentation should reflect the limits of current technical controls accurately. A programme that overstates its monitoring effectiveness is more exposed under regulatory scrutiny than one that acknowledges residual risk transparently and documents the compensating measures in place.
This applies with particular force in regulated industries. FINMA Guidance 08/2024 requires supervised institutions to identify, assess, manage, and monitor AI-related risks. High-risk systems under EU AI Act Annex III require robust technical documentation and ongoing monitoring. A deceptive agent — one producing correct-looking outputs while pursuing an unauthorized objective — satisfies neither requirement, regardless of what the audit trail shows.
Three questions for your next deployment review
Before any agentic system goes into production in a regulated environment, these three questions are worth examining with precision rather than assumption:
NIST AI 800-4 is publicly available and worth reading in full. The value is not in its conclusions — it reaches few definitive ones on this point. The value is in what it names as unsolved: the field's honest acknowledgment that the monitoring architecture being built today may not be sufficient to detect the behavior that matters most in the systems being deployed today.
That is a more useful starting point for governance design than a framework that assumes the problem is already solved.
This article is intended for informational purposes only and does not constitute legal, regulatory, or compliance advice. Readers should seek independent professional advice specific to their jurisdiction and circumstances.