Prompt Injection Is a Boundary Failure
Architecture-level security analysis of prompt injection as a boundary-control failure across instruction, context, schema, session, retrieval, output, agency, and monitoring boundaries.
Mapping LLM application failure modes to OWASP and NIST risk categories
Prompt injection is often described as a malicious instruction that makes a model ignore its original prompt. That description is useful, but incomplete. In LLM applications, the security problem is broader than the prompt itself. The practical failure often appears when untrusted language reaches a control surface that the system treats as trusted: a system instruction, retrieved document, schema field, session state, tool argument, output parser, risk classifier, or reporting path.
OWASP defines prompt injection as a vulnerability where prompts alter an LLM’s behavior or output in unintended ways. OWASP also distinguishes direct prompt injection from indirect prompt injection, where external sources such as websites or files are interpreted by the model and alter its behavior. The impact depends on both the business context and the agency granted to the system. (OWASP Gen AI Security Project)
That framing shifts the security question.
The important question is not only:
Can a model be manipulated by language?
The stronger architecture question is:
What parts of the application are allowed to trust, route, execute, store, or escalate language once the model has processed it?
This article maps LLM application failure modes to OWASP and NIST risk categories. The goal is not to introduce a replacement taxonomy. The goal is to make prompt-injection-related failures easier to review at the system boundary level.
Scope and method
This article is an architecture-level security analysis. It maps observed LLM application failure patterns to OWASP and NIST categories rather than making vendor-specific vulnerability claims.
The focus is application design: how untrusted language can cross instruction, context, schema, session, output, retrieval, agency, and monitoring boundaries.
The analysis uses the following formal anchors:
- OWASP LLM01: Prompt Injection
- OWASP LLM05: Improper Output Handling
- OWASP LLM06: Excessive Agency
- OWASP LLM07: System Prompt Leakage
- OWASP LLM08: Vector and Embedding Weaknesses
- NIST AI Risk Management Framework
- NIST AI 600-1 Generative AI Profile
NIST describes the AI RMF as a voluntary framework intended to help organizations incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems. NIST AI 600-1 extends that frame specifically to generative AI and risk management across lifecycle stages. (NIST)
Core thesis
Prompt injection should be treated as a boundary-control failure in LLM applications, not only as malicious prompt text.
A prompt-injection event becomes materially risky when the application allows language to affect a trusted control surface:
| Boundary | Security question |
|---|---|
| Instruction boundary | Can untrusted text modify or override intended instructions? |
| Context boundary | Can retrieved or pasted content influence behavior beyond its intended reference role? |
| Format boundary | Can structured text pass schema checks while carrying unsafe semantics? |
| Session boundary | Can user input alter memory, tenant, role, or approval assumptions? |
| Output boundary | Can model output become trusted input to another system? |
| Retrieval boundary | Can external content enter the model context with the wrong authority? |
| Agency boundary | Can the model trigger tools, functions, or actions beyond safe scope? |
| Monitoring boundary | Can security-relevant behavior fail to surface to logs, review, or escalation? |
This boundary model matters because prompting alone is not a security control. OWASP’s mitigation guidance for prompt injection includes constraining behavior, validating output formats with deterministic code, filtering inputs and outputs, limiting privileges, requiring human approval for high-risk actions, segregating external content, and adversarial testing. (OWASP Gen AI Security Project)
Formal baseline
OWASP LLM01: Prompt Injection
LLM01 is the primary category for cases where input changes model behavior or output in unintended ways. It covers both direct prompt injection and indirect prompt injection through external sources. OWASP lists potential impacts including sensitive-information disclosure, exposure of system prompts, content manipulation, unauthorized access to functions, command execution in connected systems, and manipulation of critical decisions. (OWASP Gen AI Security Project)
For application security review, LLM01 should be read as more than a prompt-writing problem. It is a signal that the application may not be separating trusted instructions from untrusted text.
OWASP LLM05: Improper Output Handling
LLM05 covers insufficient validation, sanitization, and handling of LLM-generated output. OWASP’s current LLM Top 10 material summarizes the category as improper handling of outputs generated by large language models. (OWASP Gen AI Security Project)
This category is critical because LLM output is not automatically safe. It may be influenced by user input, retrieved content, or earlier model output. Once that output is passed to a browser, database, shell, workflow engine, ticketing system, email client, or tool call, the risk shifts from language generation to software execution and data handling.
OWASP LLM06: Excessive Agency
LLM06 applies when an LLM-based system has the ability to call functions, use extensions, or interface with other systems. OWASP describes excessive agency as a vulnerability that enables damaging actions in response to unexpected, ambiguous, or manipulated LLM outputs. OWASP identifies excessive functionality, excessive permissions, and excessive autonomy as typical root causes. (OWASP Gen AI Security Project)
This is the agentic layer of prompt injection risk. The same injected instruction has a different impact depending on whether the model can only answer text, retrieve documents, send emails, modify records, call APIs, or trigger production workflows.
OWASP LLM07: System Prompt Leakage
LLM07 applies when system prompts or instructions contain sensitive information that was not intended to be discovered. OWASP states that system prompts should not be considered secrets or used as security controls. It also warns against placing credentials, connection strings, roles, permission structures, or sensitive data inside system prompts. (OWASP Gen AI Security Project)
This is a central point for LLM application architecture: the system prompt may steer behavior, but it should not enforce authorization, protect secrets, or replace deterministic access control.
OWASP LLM08: Vector and Embedding Weaknesses
LLM08 applies to systems using retrieval-augmented generation, embeddings, vector stores, or similar retrieval layers. OWASP summarizes this category as significant security risk in systems using RAG with large language models. (OWASP Gen AI Security Project)
This category is relevant when retrieved material becomes part of the model context. If the application does not separate retrieved data from trusted instructions, external content can influence the response path, the answer, or the downstream action.
Boundary failure modes
The following failure modes are application-level review labels. They are not official OWASP or NIST category names. Each label describes a boundary where untrusted language can cross into a trusted part of the system.
1. Format-Boundary Prompt Injection
Format-boundary prompt injection occurs when the application treats structured input or output as safer than ordinary language because it is wrapped in a formal format.
The format may be JSON, YAML, XML, Markdown, CSV, HTML, or a tool-call object. The security problem is not the format itself. The problem is assuming that format compliance equals semantic safety.
A JSON object can be syntactically valid and still contain unsafe instructions, misleading classifications, manipulated routing labels, or fields that should not be trusted downstream.
This maps primarily to:
- OWASP LLM01 when structured input changes model behavior.
- OWASP LLM05 when structured model output is passed downstream without sufficient validation.
The defensive principle: schema validation checks structure. It does not prove that the content is safe, authorized, or appropriate for the target context.
2. System Prompt Exposure Attempt
A system prompt exposure attempt is an interaction pattern that tries to reveal hidden instructions, internal rules, filtering criteria, role definitions, permission descriptions, or formatting constraints.
The risk is not merely that the prompt text becomes visible. OWASP is explicit that the system prompt itself should not be treated as a secret. The deeper issue is storing sensitive data, operational logic, or authorization assumptions in a place where they can be exposed or inferred. (OWASP Gen AI Security Project)
This maps primarily to:
- OWASP LLM07 when system instructions contain sensitive information.
- OWASP LLM01 when the same interaction attempts to alter model behavior.
- NIST AI RMF governance and measurement concerns when prompt content becomes part of application risk management.
The defensive principle: system prompts should steer model behavior, not hold secrets or enforce access control.
3. Session-State Boundary Reset
Session-state boundary reset occurs when an interaction causes the application to behave as if prior state, memory boundaries, user role, tenant context, or approval history no longer applies.
This is not automatically a prompt-injection vulnerability. It becomes security-relevant when the application uses conversational state as part of its security model.
Examples of security-relevant state include:
- user identity
- tenant boundary
- role or permission scope
- memory segment
- approval status
- policy state
- prior refusal or escalation state
This maps primarily to:
- OWASP LLM01 if user input changes model behavior around state.
- OWASP LLM06 if the altered state affects tool use or agent action.
- NIST AI RMF lifecycle risk management when state assumptions affect deployment safety.
The defensive principle: session, identity, authorization, tenant separation, and approval state must be enforced outside the model.
4. Schema-Constrained Prompt Injection
Schema-constrained prompt injection occurs when a model is required to output a valid schema, but untrusted language still controls the semantic meaning of fields used by downstream systems.
The application may ask the model to return:
- a risk score
- a classification label
- a tool argument
- a routing decision
- a policy category
- a structured summary
- a remediation instruction
The schema may be valid. The values may still be unsafe.
This maps primarily to:
- OWASP LLM01 when input changes the model’s decision.
- OWASP LLM05 when the structured output is trusted by another component.
- OWASP LLM06 if structured output triggers tools or actions.
The defensive principle: schema validation should be paired with semantic validation, allowlists, authorization checks, and context-specific safety checks.
5. Risk-Routing Manipulation
Risk-routing manipulation occurs when user-controlled language influences how the system classifies risk, chooses a response path, escalates a case, suppresses a case, or routes the interaction into a different control mode.
This failure mode should be framed carefully. It is not a standalone OWASP category. It is an application-level control failure that may appear through prompt injection, output handling, or excessive agency.
This maps primarily to:
- OWASP LLM01 if input changes model behavior.
- OWASP LLM05 if the model’s risk label is consumed downstream.
- NIST AI RMF measurement and management functions when risk classification affects governance or operational controls.
The defensive principle: risk routing should not rely only on model self-classification. High-impact routing should use deterministic policy logic, independent checks, and audit trails.
6. Echoed Output Contamination
Echoed output contamination occurs when the model repeats user-provided or externally retrieved content, and a downstream component treats the repeated content as trusted because it came from the model.
The issue is trust transfer.
Untrusted input becomes model output. Model output becomes an email, rendered artifact, tool argument, database value, UI component, support ticket, policy label, or workflow instruction.
This maps primarily to:
- OWASP LLM05 because the risk appears when model output is insufficiently validated before downstream use.
- OWASP LLM01 if the echoed content was designed to influence model behavior.
- OWASP LLM06 if the echoed output triggers action through a tool or extension.
The defensive principle: model output should be treated as untrusted until validated for the specific target context.
7. Retrieved Context Contamination
Retrieved context contamination occurs when external or retrieved content enters the model context and influences behavior beyond its intended role as reference material.
This is the architecture-level version of indirect prompt injection. OWASP describes indirect prompt injection as a case where an LLM accepts input from external sources such as websites or files, and the content alters model behavior when interpreted by the model. (OWASP Gen AI Security Project)
In RAG systems, the retrieval layer becomes part of the security boundary. Retrieved text may be authoritative, outdated, poisoned, irrelevant, unauthorized, or adversarially written.
This maps primarily to:
- OWASP LLM01 for indirect prompt injection.
- OWASP LLM08 for vector, embedding, and retrieval weaknesses.
- NIST AI RMF mapping and measurement functions for lifecycle-level risk management.
The defensive principle: retrieved content should be labeled as untrusted data, scoped by access control, separated from instructions, and tested for injection behavior.
8. Monitoring and Reporting Control Failure
Monitoring and reporting control failure occurs when security-relevant behavior is not logged, surfaced, classified, or routed to the correct review path.
This failure mode is not a standalone OWASP LLM Top 10 category. It is a control-layer weakness. It can amplify other categories when prompt injection, unsafe output, excessive agency, or retrieval contamination occurs but is not detected.
This maps primarily to:
- OWASP LLM01 when injection behavior is not detected.
- OWASP LLM05 when unsafe output is not flagged.
- OWASP LLM06 when tool actions are not reviewed.
- NIST AI RMF governance, measurement, and management functions.
NIST AI 600-1 positions generative AI risk management across lifecycle stages and includes suggested actions for managing GAI risks. (NIST Publications)
The defensive principle: LLM applications need observability for prompt assembly, retrieval source, model output, validation result, tool selection, policy routing, and human approval.
9. Fail-Open Fallback Behavior
Fail-open fallback behavior occurs when validation, policy checking, retrieval, classification, or tool-selection logic fails, but the system continues with a permissive default.
This is not unique to LLM systems. It becomes more difficult to detect in LLM applications because the model may still produce a fluent and confident response even when the system lacked validated context.
Examples include:
- retrieval failed, but the model answers anyway
- validation failed, but the output is still sent downstream
- risk classification is uncertain, but the system continues as low risk
- tool selection fails, but the fallback tool has broader permissions
- policy state is unavailable, but the response path remains permissive
This maps primarily to:
- OWASP LLM05 if fallback output is consumed downstream.
- OWASP LLM06 if fallback behavior enables action.
- NIST AI RMF risk management and measurement controls.
The defensive principle: high-impact paths should fail closed. The system should distinguish between “the model produced an answer” and “the application had sufficient validated context to proceed.”
10. Synthetic Risk-Routing Manipulation
Synthetic risk-routing manipulation occurs when language causes the system to over-escalate, under-escalate, or misclassify an interaction because the model or classifier reacts to surface-level signals rather than validated context.
This failure mode is best treated as a testing category, not as a proven vulnerability class by default.
It is relevant for systems that use LLMs or classifiers to assign:
- risk level
- policy category
- review queue
- escalation route
- safety mode
- response template
- tool availability
This maps primarily to:
- OWASP LLM01 if user input changes model behavior.
- OWASP LLM05 if the model-generated classification is consumed downstream.
- NIST AI RMF measurement and management functions when classification quality affects risk controls.
The defensive principle: risk classification should be evaluated for false negatives, false positives, routing drift, and evidence quality. The goal is not to maximize blocking. The goal is to route the right event to the right control.
Mapping table
| Failure mode | Primary boundary | OWASP mapping | NIST mapping | Control objective |
|---|---|---|---|---|
| Format-Boundary Prompt Injection | Format / schema | LLM01, LLM05 | Measure / Manage | Validate semantics, not only syntax |
| System Prompt Exposure Attempt | Instruction | LLM07, LLM01 | Govern / Manage | Keep secrets and authorization outside prompts |
| Session-State Boundary Reset | Session / memory | LLM01, LLM06 | Map / Manage | Enforce state outside the model |
| Schema-Constrained Prompt Injection | Schema / output | LLM01, LLM05, LLM06 | Measure / Manage | Validate structured output before use |
| Risk-Routing Manipulation | Classification / routing | LLM01, LLM05 | Measure / Manage | Use independent routing controls |
| Echoed Output Contamination | Output | LLM05, LLM01 | Measure / Manage | Treat model output as untrusted |
| Retrieved Context Contamination | Retrieval / context | LLM01, LLM08 | Map / Measure | Separate retrieved data from instructions |
| Monitoring and Reporting Control Failure | Observability | LLM01, LLM05, LLM06 | Govern / Measure / Manage | Log and surface security-relevant behavior |
| Fail-Open Fallback Behavior | Fallback / control flow | LLM05, LLM06 | Manage | Fail closed on uncertainty |
| Synthetic Risk-Routing Manipulation | Risk routing | LLM01, LLM05 | Measure / Manage | Test routing quality and evidence alignment |
Defensive architecture implications
The defensive answer is not to write a stronger hidden prompt. A stronger system prompt may improve behavior, but it should not be treated as the security boundary.
The stronger architecture is layered.
1. Separate trusted instructions from untrusted content
The application should maintain a clear separation between:
- system and developer instructions
- user input
- retrieved documents
- tool output
- prior conversation
- model-generated text
The model may receive all of these as text, but the application should not treat all text as equal authority.
2. Keep secrets and authorization outside prompts
OWASP explicitly recommends avoiding sensitive data in system prompts and enforcing critical controls such as privilege separation and authorization checks independently from the LLM. (OWASP Gen AI Security Project)
System prompts should not contain:
- credentials
- connection strings
- API keys
- database names when sensitive
- permission structures
- operational secrets
- hidden authorization logic
3. Validate output before downstream use
LLM output should not be passed directly into execution contexts, rendering contexts, workflow engines, tool calls, or database operations without validation.
The validation should be specific to the target context. HTML rendering, SQL generation, file writing, email sending, policy routing, and API calls require different controls.
4. Limit agency
For agentic systems, the highest-risk question is not whether the model can answer incorrectly. It is what the system allows the model to do after answering.
OWASP identifies excessive functionality, excessive permissions, and excessive autonomy as root causes of excessive agency. (OWASP Gen AI Security Project)
Defensive design should include:
- narrow tool scope
- least privilege
- no unnecessary extensions
- no open-ended shell/browser/database tools where avoidable
- human approval for high-impact actions
- separate agents or services for different privilege levels
5. Treat retrieved content as untrusted
RAG does not eliminate prompt injection risk. OWASP states that RAG and fine-tuning may improve relevance and accuracy, but they do not fully mitigate prompt-injection vulnerabilities. (OWASP Gen AI Security Project)
Retrieved content should be:
- access-controlled
- source-attributed
- scoped to the user and task
- separated from instructions
- filtered for injection-like content
- excluded from tool-control decisions unless independently validated
6. Make routing auditable
If the model assigns a risk label, policy category, tool route, escalation state, or workflow decision, that decision should be observable.
A secure review path should be able to answer:
- what input was used
- what retrieved context was included
- what instruction hierarchy was applied
- what output was generated
- what validator accepted or rejected it
- what tool was selected
- what policy path was triggered
- what human approval was required
7. Fail closed on uncertainty
A model can generate fluent output even when the system lacks evidence. That is a product behavior issue and a security issue.
High-impact workflows should not proceed when:
- retrieval fails
- source confidence is insufficient
- validation fails
- policy status is unknown
- session state is inconsistent
- tool arguments cannot be verified
- user authorization cannot be confirmed
The system should expose uncertainty to deterministic controls instead of converting it into a normal answer.
Architecture review checklist
| Area | Review question |
|---|---|
| Prompt assembly | Which parts of the final prompt are trusted instructions and which are untrusted data? |
| Instruction hierarchy | Can user or retrieved text override system/developer instructions? |
| System prompt | Does the system prompt contain secrets, permissions, credentials, or internal security logic? |
| Retrieval | Can external content influence instructions, tool calls, or policy decisions? |
| Schema | Does the schema validate only shape, or also semantic safety and downstream impact? |
| Output handling | Where does model output go after generation, and is it validated for that destination? |
| Tool use | What tools can the model call, and are they limited by least privilege? |
| Session state | Is identity, tenant, role, memory, or approval state enforced outside the model? |
| Risk routing | Can user-controlled language change severity, escalation, or control path? |
| Monitoring | Are suspicious retrievals, validation failures, tool calls, and routing decisions logged? |
| Fallbacks | Does the system fail closed when context, validation, or policy state is uncertain? |
Practical review model
A useful review process is to inspect the LLM application as a request assembly and control-flow system.
For each user request, identify:
- What the user provided.
- What the application retrieved.
- What the system inserted.
- What the model generated.
- What the application parsed.
- What downstream system consumed.
- What action became possible.
- What was logged or escalated.
The security review should focus on the transitions:
- user text → model instruction space
- retrieved content → trusted context
- model output → structured object
- structured object → tool argument
- model classification → policy route
- fallback output → downstream workflow
- untrusted content → stored memory
These transitions are where language becomes control.
Conclusion
Prompt injection is not only a prompt problem. It is a system boundary problem.
The model processes language. The application decides what that language can influence.
That distinction is the core security issue.
If untrusted text can alter instructions, contaminate retrieved context, shape structured output, reset session assumptions, trigger tools, affect risk routing, or bypass monitoring, the failure is architectural. It cannot be solved only by hiding stronger instructions inside the prompt.
OWASP provides the LLM application risk categories. NIST provides the broader risk-management frame. The boundary model connects both to implementation review.
The practical security standard is clear:
- do not place secrets or authorization logic in prompts
- do not treat model output as trusted
- do not let retrieved text act as instruction
- do not delegate high-impact decisions to unverified model output
- do not give agents broad tools, permissions, or autonomy
- do not let failures proceed silently
The model is not the security boundary.
The application architecture is.
Suggested reading
- The Attack Surface Starts Before Agents — The LLM Boundary
- Web-Retrieved Content Is a Prompt-Injection Boundary in Tool-Using LLM Systems
- Request assembly threat model: reading the diagram
- Prompt Assembly Policy Enforcement: Typed Provenance to Prevent Authority Confusion
- How Agentic Control-Plane Failures Actually Happen
- Engineering Quality Gate — Procedure
References
- OWASP Gen AI Security Project — LLM01:2025 Prompt Injection
- OWASP Gen AI Security Project — LLM05:2025 Improper Output Handling
- OWASP Gen AI Security Project — LLM06:2025 Excessive Agency
- OWASP Gen AI Security Project — LLM07:2025 System Prompt Leakage
- OWASP Gen AI Security Project — LLM08:2025 Vector and Embedding Weaknesses
- NIST AI 100-1 — Artificial Intelligence Risk Management Framework (AI RMF 1.0)
- NIST AI 600-1 — Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile