Prompt Injection Is a Product Security Problem, Not a Model Limitation
The most common framing of prompt injection in 2026 still treats it as a problem with the model: the model is too gullible, the system prompt is not strong enough, the alignment training has gaps. This framing produces the wrong defences. It produces longer system prompts, more elaborate "ignore previous instructions" filters, content classifiers trained on adversarial examples. None of these meaningfully reduce the realised attack rate in production.
The framing that does reduce the attack rate treats prompt injection as a product security problem in the same family as command injection, SSRF, and SQL injection. Like its predecessors, it is fundamentally an architecture issue: the application allowed untrusted input to influence privileged execution. The defence is the same family of patterns we have used for two decades against the older injection classes, applied to LLM-mediated control flow.
This piece is the product-security reading of prompt injection, intended for engineering leads building production GenAI systems. It is opinionated about which mitigations actually work and which are theatre.
Why model-quality fixes don't close the class
The intuition that "a smarter model would resist prompt injection better" is partly true and entirely insufficient. Frontier models in 2026 are dramatically better at refusing direct adversarial prompts than 2023 models. The realised attack rate in production has not dropped proportionally. The reason is that the attack vector evolved.
Direct prompt injection (user types adversarial instructions) is now well-defended in most production systems. The realised attacks are indirect: adversarial content arrives via retrieved documents, web pages the agent fetched, files the user uploaded, tool outputs from external services, or memory entries from earlier turns. By the time the model sees the injection, it is interleaved with legitimate content from a trusted-looking source.
A perfectly aligned model that follows its instructions perfectly will still execute an injected instruction in this case, because the model cannot distinguish "this came from my operator" from "this was retrieved from a web page." The information needed to make that distinction is not in the model's input; it has to come from the application architecture.
This is exactly the structural reason classic SQL injection was not solved by "smarter parsers." It was solved by parameterised queries, which separate the data path from the control path at the application layer. The same separation is what closes prompt injection.
The pattern: separate the read context from the write authority
The single architectural pattern that closes most realised prompt injection is privilege separation between what the model is allowed to read and what the model is allowed to cause. The model can read from anywhere, including untrusted sources. The model cannot directly cause privileged actions in those reading contexts.
In practice this manifests as four distinct layers in the application:
1. The conversation layer. Where the model reads from any source — the user, retrieved documents, tool outputs, memory — and produces text or structured output. This layer has no direct privileges. It cannot write to the database, call external APIs that mutate state, or trigger user-visible actions.
2. The structured-output schema. Where the model expresses what it wants done, but in a constrained form: JSON conforming to a predefined schema, function-call arguments matching a signature, command names from a known set. Free-form text in this layer is rejected, not parsed leniently.
3. The validation gate. Where structured output is checked against business rules, user permissions, action-budget limits, and provenance. This is normal authorisation logic; it does not need to understand language. It checks "does this user have permission to perform this action on this resource right now."
4. The execution layer. Which performs the action and only the action, with no model in the loop. The execution can be reasoned about by traditional security tooling because it is no longer LLM-mediated.
When you build with these four layers, prompt injection becomes scoped. The injected instruction can change what the model says (annoying, sometimes embarrassing) but it cannot change what the system does, because the system's actions are gated through layers 2–4 that do not honour free-form natural-language instructions.
Where teams break this pattern by accident
The pattern is not novel. Teams break it because they build for velocity first and security later. The common failure modes:
Tool definitions that accept free-form text. A tool that takes a single query parameter of unconstrained text is just a back-door from the model to the database. Tools should accept structured arguments matched against a schema, not bag-of-text.
The "let the model decide" anti-pattern. Model output that is parsed by another model, which then dispatches actions. Each model in the chain is a layer of indirection through which the injection propagates without ever passing a validation gate. The model is treated as the validation, when the model is exactly the thing being attacked.
Memory entries treated as trusted. A memory store that captures arbitrary content from previous turns and re-injects it on later turns is a perfect injection persistence layer. Memory entries should be tagged with provenance and untrusted entries should not be eligible to influence privileged actions.
Caching across users. A cache key that does not include user identity, or a session that bleeds context, allows one user's injection to land in another user's session. This is the same "blast radius" issue as in cloud architecture: the failure pattern is shared fate.
We see all four of these in the production systems we audit. Fixing them is not a security project bolted on at the end; it is an architecture refactor that pays for itself in operational stability as well.
The detection layer that should also exist
Architectural mitigation reduces the attack surface; it does not eliminate it. A complementary detection layer covers the residual:
- Anomaly detection on the model's structured outputs. Sudden requests for high-privilege actions, unusual argument shapes, repeated failures of the validation gate from a single session, are detection signals.
- Action-rate metrics per session and per data source. A session that suddenly starts triggering 30 actions in a minute when a baseline session triggers 2, is exhibiting either a bug or an injection success.
- Provenance-aware logging. Every action logged with the trust tags of the inputs that led to it. Post-incident reconstruction depends on this telemetry; the older "we logged the prompt and the response" pattern is too coarse to investigate.
These detections feed into the SIEM/SOAR practice we covered previously; GenAI systems are now part of the detection surface.
Where this connects to our practice
Pelican Tech's AI Solutions practice builds production GenAI systems with the four-layer architecture above as the default, not as an afterthought. We work alongside our risk management team when the GenAI threat model needs to integrate with the broader cyber programme, and with our SIEM/SOAR specialists to bring the detection layer into the existing SOC.
If your GenAI application is in production and the architecture does not yet have the read/write privilege separation, that is the engagement to start with before you encounter the realised version of this attack class.