AISecurityOWASPLLMApplication Security

OWASP LLM Top 10 in Production: Hardening RAG, Agents, and the Vectors Most Teams Miss

Pelican Tech April 13, 2026 6 min read

Abstract dark composition with a blue LLM core surrounded by orange exploit-vector traces, evoking GenAI risk surface mapping

The OWASP LLM Top 10 (2025 edition) is a useful list. It is also a list, which is the wrong artefact for the work that actually has to happen. Lists make checkboxes. The hardening of a production GenAI application is not a checkbox exercise; it is a threat-modelling and architecture exercise that uses the list as a starting prompt rather than as the destination.

This piece is the operational reading of the LLM Top 10 we use with clients building production GenAI systems. It maps each category onto where the risk actually shows up in real architectures (RAG pipelines, agentic systems, fine-tuning workflows) and what the corresponding engineering control looks like. We are opinionated about which categories matter most in 2026 and which are over-emphasised relative to their realised risk.

The risks that consistently land in production incidents

In our incident-response work on GenAI applications over the last 18 months, three of the OWASP categories account for the substantial majority of realised harm:

LLM01 Prompt Injection. This remains the dominant attack vector by a wide margin. The interesting evolution is that direct prompt injection (a user typing "ignore previous instructions") is now well-defended in most production systems; the realised attacks are almost all indirect prompt injection, where attacker-controlled content arrives via retrieval, web fetches, file uploads, or tool outputs. The defence pattern that actually works is privilege separation between the model's read context and its write authority, not better system prompts.

LLM02 Sensitive Information Disclosure. Usually shows up as either training-data leakage (rare in production-grade RAG) or context-leakage across users (much more common). The pattern: a system prompt or retrieved context contains data from user A, then a request from user B inherits some of that context due to caching, session reuse, or tenant-isolation mistakes. The control is strict per-request context construction and no shared model state between users.

LLM05 Improper Output Handling. The model produces output that is then consumed unsafely by downstream code (executed as a query, parsed as JSON without validation, rendered as HTML). In agent architectures this becomes severe because tools execute model output as instructions. The control is treating model output as untrusted input, the same way you would treat user input.

If you address these three with rigor, you have addressed roughly 75% of the realised risk. The other categories matter, but they are over-weighted in most threat-model exercises relative to incident frequency.

Indirect prompt injection in RAG: the architectural fix

A RAG system has three places where attacker-controlled content can enter the model's context: the document corpus (if any user can contribute), web retrieval (if the system fetches external content), and tool outputs (if tools touch attacker-influenced data). Each of these is a potential prompt injection vector regardless of how careful the system prompt is.

The wrong fix is to keep refining filters that try to detect malicious instructions in retrieved content. The detection arms race is unwinnable; defenders write filters, attackers write the next variant.

The right fix is architectural. We use three patterns:

Privilege-separated tools. The model can read from anywhere, but it cannot trigger high-privilege actions (database writes, external API calls, financial transactions, user-data modifications) directly from a turn that included untrusted retrieval. Privileged actions require a separate, smaller, instruction-locked model invocation that takes structured parameters extracted from the main flow. Attacker content can still influence what the model says; it cannot influence what the system does.

Output structure schemas. Where the model's output drives downstream behaviour (tool selection, action arguments, query generation), require a strict JSON schema and reject anything that does not parse. The injection still happens; the injection's intended effect is contained because the structured-output channel does not honour free-form instructions.

Provenance-tracked context. Every chunk of context that reaches the model carries a provenance tag (trusted system, trusted user, untrusted retrieved). Tools and downstream consumers can check provenance and refuse high-trust actions on low-trust content. This is the LLM equivalent of taint analysis.

These three together close indirect prompt injection in production RAG. None of them are easy retrofits, which is why programmes that built the application first and added security later usually need significant refactoring.

The agentic systems risk surface

When the application becomes agentic — model with tools, multi-step reasoning, autonomous action — the threat surface expands in a specific way. The model's actions span time and tools, which means an attacker can plant an instruction now that triggers an action in three turns. This is the pattern called delayed prompt injection in current research.

The countermeasures we recommend for production agent systems:

Action budget per session. A hard ceiling on the number of high-privilege tool calls per agent session. An agent that hits the ceiling pauses for human review. Most realised harm in agent systems is high tool-call volumes; capping the volume is a crude but effective check.
Session memory bounded and provenance-tagged. The longer an agent's memory, the more time-displaced injections it can carry. Bound memory to the task and tag every memory entry with its source.
Tool result schemas and trust scopes. Same pattern as RAG: tools return structured outputs and carry trust tags. The model can reason about tool output but cannot ingest untrusted tool output as instructions.
Per-agent rate limits at the action layer. Not just at the API layer. An agent that suddenly fires 50 tool calls in 60 seconds is exhibiting either a bug or an attack; either way, the action layer should pause it.

Agent systems are where the most novel security work lives in 2026. The categories on the LLM Top 10 still apply, but the realised threats compose differently when the model acts.

What is over-emphasised on the list

Three categories show up prominently in the LLM Top 10 but produce less realised harm than their position suggests:

LLM04 Data and Model Poisoning. Real but expensive to execute and rarely the cheapest path for an attacker. Most poisoning attacks in the wild target open-source models or fine-tuning data; if you control your fine-tuning data and have basic provenance hygiene on retrieved content, this is not your highest-priority work.

LLM06 Excessive Agency. Already covered in the agent systems section. The category name encourages thinking about it as a behavioural property when it is actually an architecture property. Build with privilege separation and the category mostly takes care of itself.

LLM10 Unbounded Consumption. A real problem (cost spirals, denial of wallet) but solved by standard rate-limiting and budget alerts at the application and infrastructure layer. The realised risk is real but the work is not novel.

We frame the OWASP list this way with clients to direct attention to the categories that move the security posture the most, rather than dilute effort across all ten equally.

Where this connects to our practice

Pelican Tech's AI Solutions practice builds production GenAI systems with these architectures from the start: privilege-separated tools, schema-validated outputs, provenance-tracked context, action budgets for agentic systems. We work alongside our risk management team when the threat model needs to integrate with the broader cyber programme, and with our identity team when the per-request context-construction pattern requires identity-aware data access.

If your GenAI application is in production and the threat model has not been updated to address indirect prompt injection at the architecture level, that is the engagement to start with.