Agentic Models, Part 2
Moving Upstream from Browsers to Content
Last week I wrote about why agentic browsers invalidate traditional assumptions about intent and authority. A few readers asked an interesting question: If the reasoning layer can be manipulated, what does defense actually look like?
Given the attention (and funding) afforded to AI these days, many companies assume AI models, themselves, are where security control should live. Train the model better, prompt it better, tighten the instruction hierarchy, align it more closely with corporate policy.
It’s the implication that if we iterate hard enough, if we focus carefully enough on the reasoning layer, we can eliminate risk at the point of possible attack.
Yet, this reasoning (pun intended) is faulty for numerous reasons. If we tweak the model, if we focus on the end goal, we miss the warning signals and move the damage layer to runtime, where the risks are higher, the outcomes are less predictable, and the implications are farther reaching.
For clarification, when I write “reasoning layer,” I’m referring to the part of the system where the model interprets context and decides the consequences: which tools to call, what data to retrieve, what action to recommend. It is the translation boundary between language and execution. With AI, the boundary is assumed intelligent because it produces fluent output, but from a security standpoint it’s just like any other interpreter that converts input into instructions. Subjectivity is inherent. Even when we’re dealing with a non-sentient systems.
However, security practitioners have heard this script before. It’s been part of the messaging about browser sandboxing, client-side automation layers, and even with plugin ecosystems. If we trust the message, the trust pattern repeats: hype the abstraction, ignore the interpreter’s border, and wait for a threat actor to own your system.
Your Model is Not the Boss of Me
Because we’re talking about AI, which operates entirely on a system of data ingestion, we must look at how an attacker can exploit inputs, in AI’s case, prompts.
Prompt injection is the main concern here, as mentioned in part 1, and updates to a training dataset or tweaks a prompt template don’t eliminate the risk. Whenever a user (friendly or adversarial, human or machine) supplies input (data, prompts, etc.), the system combines probabilistic reasoning with those inputs. If we blindly trust the prompts, we’re blindly accepting risk by assuming no one/nothing has malicious intent. That’s a dangerous assumption.
When it comes to large language models (LLMs), they do not have an inherent concept of trust (which is good when assuming humans are in the loop). LLMs offer the most plausible continuation of context they are given (i.e., data in; data out). That is their strength and also their weakness (i.e., garbage in; garbage out)
If one expects an AI model to enforce security control inside its own reasoning process, it would be akin to expecting a programming language interpreter to separate code from data in every conceivable adversarial scenario. We’ve seen what happens when that assumption fails: entire classes of injection vulnerabilities, cross-site scripting, command interpreters manipulated through esoteric input channels, and an industry built around containing those failure modes. Trying to make the interpreter enforce its own security control has never worked. Instead we must build control around the interpreter.
Agentic systems put us back in the “trust” position with a new interface and a new vocabulary but the same core issue. If you feed unverified content to a system that executes actions, you have created an attack surface. Nothing about the AI model’s internal logic changes that.
Execution Changes Everything
To be implemented and achieve the promised efficiencies AI offers, its agents must have some sort of system access. Anytime a system can use credentials, interact with SaaS tools, move data, trigger workflows, call APIs, and so on, we have moved out of the realm of theoretical risk and into the realm of execution management. At this point, we’re not talking about a chatbot that delivers plausible text. We are talking about a runtime surface that acts like a human user, but with machine speed and persistence.
Security teams have seen this movie before—with API gateways, automation engines, cloud orchestration layers, CI/CD pipelines, and service meshes. In each scene, we watched as execution surfaces required control: boundary enforcement, policy evaluation, identity scoping, observability, and clear attribution.
In the same vein, treating an autonomous agent as anything other than an execution surface is a real risk that most security practitioners shouldn’t want to take. It’s easy for the business to buy into the hype and pretend autonomy is just a UI feature. But security teams (should) know that accepting agentic models introduces enhanced risk as soon as the system exercises autonomous authority.
When it comes to the “so what,” the risk is not just that malicious content is an potential output; it’s that the agent interprets content as admin-supplied, authorized instructions and executes commands with full privileges and valid credentials. When that happens, nothing in the audit logs signals compromise. What operators see is legitimate access, legitimate actions, and yet the outcome is—predictably—highly problematic. And not an anomalous one-off situation, unfortunately.
Context Is a (Potential) Compromise Channel
One of the more pernicious misconceptions about LLM data is that context is inherently “clean.” Internal knowledge bases, documentation repositories, project trackers, chat histories—these are all sources the model may ingest for its reasoning process. From the model’s perspective, sources are simply more text. From a security standpoint, these are input channels an attacker can influence.
And from a security standpoint, once something becomes an input channel, it belongs in the threat model. Considering content benign because it looks like vetted or validated documentation is no different from trusting a hidden HTTP header or a JavaScript payload simply because it was “internally” supplied.
Prompting techniques—hierarchies, system instructions, context filtering—help eliminate the potential for vulnerabilities. But they do not eliminate the fundamental fact that the model cannot independently determine trust. Today, AI systems are merely machines with lightening-fast interpretations of data. It’s math—interpretations based on likelihood. Instruction layering may delay exploitation, but it does not prevent the machine from interpreting an adversary’s commands as guidance.
In a system that executes on context, data retrieval is not passive reading; it’s the instruction that initiates execution. Anything less than treating it as hostile input is hope, not a security strategy.
Authorization Requires Verification, Not Trust
Security professionals shouldn’t be surprised by any of this. We have spent years touting “Zero trust! Never trust! Always verify!”1 The AI model should be able to propose a next step but never allowed to unilaterally authorize it. This is the “separation of church and state” required to apply a modicum of control and oversight to all systems of enforcement as a protection against wrongdoing.
Further, this is where runtime security architecture matters. The LLMs model’s reasoning output needs to evaluate prompts against policies before any system action is taken, not after. Actual enforcement requires that every action—whether it’s a tool invocation, an API call, or a workflow transition—passes through a layer that can accept, deny, or escalate based on deterministic policy.
Privilege Predicts Damage
Autonomous systems inherit any authority supplied credentials allow, which means that practical defense is neither mystical nor model-driven; it is operational. As such:
Retrieved context: belongs in the same security category as user input (prompts) and should pass through sanitization and scope limits before it influences execution.
Least privilege: Should apply to all agents. Enforce short-lived tokens and pass sensitive actions through deterministic policy verification the model cannot bypass.
Capture context: Using runtime telemetry that shows:
what the agent accessed
the actions attempted
which policies were allowed or denied.
Security teams already apply these concepts to APIs, service accounts, and automation frameworks. AI systems introduce yet another execution layer into the enterprise, and like every execution layer before them, they’ll either be managed through architecture or exploited because of optimism.
The good news is that security teams already know how to secure interpreters; the challenge is remembering to focus on the foundations built for innumerable innovations that came before and treat AI accordingly.
OK, at least I have. But for a while it was my job so, there you have it.




