Prompt Injection Is a Boundary Failure

By Tamar Peretz Published 2026-06-21

Architecture-level security analysis of prompt injection as a boundary-control failure across instruction, context, schema, session, retrieval, output, agency, and monitoring boundaries.

Mapping LLM application failure modes to OWASP and NIST risk categories

Prompt injection is often described as a malicious instruction that makes a model ignore its original prompt. That description is useful, but incomplete. In LLM applications, the security problem is broader than the prompt itself. The practical failure often appears when untrusted language reaches a control surface that the system treats as trusted: a system instruction, retrieved document, schema field, session state, tool argument, output parser, risk classifier, or reporting path.

OWASP defines prompt injection as a vulnerability where prompts alter an LLM’s behavior or output in unintended ways. OWASP also distinguishes direct prompt injection from indirect prompt injection, where external sources such as websites or files are interpreted by the model and alter its behavior. The impact depends on both the business context and the agency granted to the system. (OWASP Gen AI Security Project)

That framing shifts the security question.

The important question is not only:

Can a model be manipulated by language?

The stronger architecture question is:

What parts of the application are allowed to trust, route, execute, store, or escalate language once the model has processed it?

This article maps LLM application failure modes to OWASP and NIST risk categories. The goal is not to introduce a replacement taxonomy. The goal is to make prompt-injection-related failures easier to review at the system boundary level.

Scope and method

This article is an architecture-level security analysis. It maps observed LLM application failure patterns to OWASP and NIST categories rather than making vendor-specific vulnerability claims.

The focus is application design: how untrusted language can cross instruction, context, schema, session, output, retrieval, agency, and monitoring boundaries.

The analysis uses the following formal anchors:

OWASP LLM01: Prompt Injection
OWASP LLM05: Improper Output Handling
OWASP LLM06: Excessive Agency
OWASP LLM07: System Prompt Leakage
OWASP LLM08: Vector and Embedding Weaknesses
NIST AI Risk Management Framework
NIST AI 600-1 Generative AI Profile

NIST describes the AI RMF as a voluntary framework intended to help organizations incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems. NIST AI 600-1 extends that frame specifically to generative AI and risk management across lifecycle stages. (NIST)

Core thesis

Prompt injection should be treated as a boundary-control failure in LLM applications, not only as malicious prompt text.

A prompt-injection event becomes materially risky when the application allows language to affect a trusted control surface:

Boundary	Security question
Instruction boundary	Can untrusted text modify or override intended instructions?
Context boundary	Can retrieved or pasted content influence behavior beyond its intended reference role?
Format boundary	Can structured text pass schema checks while carrying unsafe semantics?
Session boundary	Can user input alter memory, tenant, role, or approval assumptions?
Output boundary	Can model output become trusted input to another system?
Retrieval boundary	Can external content enter the model context with the wrong authority?
Agency boundary	Can the model trigger tools, functions, or actions beyond safe scope?
Monitoring boundary	Can security-relevant behavior fail to surface to logs, review, or escalation?

This boundary model matters because prompting alone is not a security control. OWASP’s mitigation guidance for prompt injection includes constraining behavior, validating output formats with deterministic code, filtering inputs and outputs, limiting privileges, requiring human approval for high-risk actions, segregating external content, and adversarial testing. (OWASP Gen AI Security Project)

Formal baseline

OWASP LLM01: Prompt Injection

LLM01 is the primary category for cases where input changes model behavior or output in unintended ways. It covers both direct prompt injection and indirect prompt injection through external sources. OWASP lists potential impacts including sensitive-information disclosure, exposure of system prompts, content manipulation, unauthorized access to functions, command execution in connected systems, and manipulation of critical decisions. (OWASP Gen AI Security Project)

For application security review, LLM01 should be read as more than a prompt-writing problem. It is a signal that the application may not be separating trusted instructions from untrusted text.

OWASP LLM05: Improper Output Handling

LLM05 covers insufficient validation, sanitization, and handling of LLM-generated output. OWASP’s current LLM Top 10 material summarizes the category as improper handling of outputs generated by large language models. (OWASP Gen AI Security Project)

This category is critical because LLM output is not automatically safe. It may be influenced by user input, retrieved content, or earlier model output. Once that output is passed to a browser, database, shell, workflow engine, ticketing system, email client, or tool call, the risk shifts from language generation to software execution and data handling.

OWASP LLM06: Excessive Agency

LLM06 applies when an LLM-based system has the ability to call functions, use extensions, or interface with other systems. OWASP describes excessive agency as a vulnerability that enables damaging actions in response to unexpected, ambiguous, or manipulated LLM outputs. OWASP identifies excessive functionality, excessive permissions, and excessive autonomy as typical root causes. (OWASP Gen AI Security Project)

This is the agentic layer of prompt injection risk. The same injected instruction has a different impact depending on whether the model can only answer text, retrieve documents, send emails, modify records, call APIs, or trigger production workflows.

OWASP LLM07: System Prompt Leakage

LLM07 applies when system prompts or instructions contain sensitive information that was not intended to be discovered. OWASP states that system prompts should not be considered secrets or used as security controls. It also warns against placing credentials, connection strings, roles, permission structures, or sensitive data inside system prompts. (OWASP Gen AI Security Project)

This is a central point for LLM application architecture: the system prompt may steer behavior, but it should not enforce authorization, protect secrets, or replace deterministic access control.

OWASP LLM08: Vector and Embedding Weaknesses

LLM08 applies to systems using retrieval-augmented generation, embeddings, vector stores, or similar retrieval layers. OWASP summarizes this category as significant security risk in systems using RAG with large language models. (OWASP Gen AI Security Project)

This category is relevant when retrieved material becomes part of the model context. If the application does not separate retrieved data from trusted instructions, external content can influence the response path, the answer, or the downstream action.

Boundary failure modes

The following failure modes are application-level review labels. They are not official OWASP or NIST category names. Each label describes a boundary where untrusted language can cross into a trusted part of the system.

1. Format-Boundary Prompt Injection

Format-boundary prompt injection occurs when the application treats structured input or output as safer than ordinary language because it is wrapped in a formal format.

The format may be JSON, YAML, XML, Markdown, CSV, HTML, or a tool-call object. The security problem is not the format itself. The problem is assuming that format compliance equals semantic safety.

A JSON object can be syntactically valid and still contain unsafe instructions, misleading classifications, manipulated routing labels, or fields that should not be trusted downstream.

This maps primarily to:

OWASP LLM01 when structured input changes model behavior.
OWASP LLM05 when structured model output is passed downstream without sufficient validation.

The defensive principle: schema validation checks structure. It does not prove that the content is safe, authorized, or appropriate for the target context.

2. System Prompt Exposure Attempt

A system prompt exposure attempt is an interaction pattern that tries to reveal hidden instructions, internal rules, filtering criteria, role definitions, permission descriptions, or formatting constraints.

The risk is not merely that the prompt text becomes visible. OWASP is explicit that the system prompt itself should not be treated as a secret. The deeper issue is storing sensitive data, operational logic, or authorization assumptions in a place where they can be exposed or inferred. (OWASP Gen AI Security Project)

This maps primarily to:

OWASP LLM07 when system instructions contain sensitive information.
OWASP LLM01 when the same interaction attempts to alter model behavior.
NIST AI RMF governance and measurement concerns when prompt content becomes part of application risk management.

The defensive principle: system prompts should steer model behavior, not hold secrets or enforce access control.

3. Session-State Boundary Reset

Session-state boundary reset occurs when an interaction causes the application to behave as if prior state, memory boundaries, user role, tenant context, or approval history no longer applies.

This is not automatically a prompt-injection vulnerability. It becomes security-relevant when the application uses conversational state as part of its security model.

Examples of security-relevant state include:

user identity
tenant boundary
role or permission scope
memory segment
approval status
policy state
prior refusal or escalation state

This maps primarily to:

OWASP LLM01 if user input changes model behavior around state.
OWASP LLM06 if the altered state affects tool use or agent action.
NIST AI RMF lifecycle risk management when state assumptions affect deployment safety.

The defensive principle: session, identity, authorization, tenant separation, and approval state must be enforced outside the model.

4. Schema-Constrained Prompt Injection

Schema-constrained prompt injection occurs when a model is required to output a valid schema, but untrusted language still controls the semantic meaning of fields used by downstream systems.

The application may ask the model to return:

a risk score
a classification label
a tool argument
a routing decision
a policy category
a structured summary
a remediation instruction

The schema may be valid. The values may still be unsafe.

This maps primarily to:

OWASP LLM01 when input changes the model’s decision.
OWASP LLM05 when the structured output is trusted by another component.
OWASP LLM06 if structured output triggers tools or actions.

The defensive principle: schema validation should be paired with semantic validation, allowlists, authorization checks, and context-specific safety checks.

5. Risk-Routing Manipulation

Risk-routing manipulation occurs when user-controlled language influences how the system classifies risk, chooses a response path, escalates a case, suppresses a case, or routes the interaction into a different control mode.

This failure mode should be framed carefully. It is not a standalone OWASP category. It is an application-level control failure that may appear through prompt injection, output handling, or excessive agency.

This maps primarily to:

OWASP LLM01 if input changes model behavior.
OWASP LLM05 if the model’s risk label is consumed downstream.
NIST AI RMF measurement and management functions when risk classification affects governance or operational controls.

The defensive principle: risk routing should not rely only on model self-classification. High-impact routing should use deterministic policy logic, independent checks, and audit trails.

6. Echoed Output Contamination

Echoed output contamination occurs when the model repeats user-provided or externally retrieved content, and a downstream component treats the repeated content as trusted because it came from the model.

The issue is trust transfer.

Untrusted input becomes model output. Model output becomes an email, rendered artifact, tool argument, database value, UI component, support ticket, policy label, or workflow instruction.

This maps primarily to:

OWASP LLM05 because the risk appears when model output is insufficiently validated before downstream use.
OWASP LLM01 if the echoed content was designed to influence model behavior.
OWASP LLM06 if the echoed output triggers action through a tool or extension.

The defensive principle: model output should be treated as untrusted until validated for the specific target context.

7. Retrieved Context Contamination

Retrieved context contamination occurs when external or retrieved content enters the model context and influences behavior beyond its intended role as reference material.

This is the architecture-level version of indirect prompt injection. OWASP describes indirect prompt injection as a case where an LLM accepts input from external sources such as websites or files, and the content alters model behavior when interpreted by the model. (OWASP Gen AI Security Project)

In RAG systems, the retrieval layer becomes part of the security boundary. Retrieved text may be authoritative, outdated, poisoned, irrelevant, unauthorized, or adversarially written.

This maps primarily to:

OWASP LLM01 for indirect prompt injection.
OWASP LLM08 for vector, embedding, and retrieval weaknesses.
NIST AI RMF mapping and measurement functions for lifecycle-level risk management.

The defensive principle: retrieved content should be labeled as untrusted data, scoped by access control, separated from instructions, and tested for injection behavior.

8. Monitoring and Reporting Control Failure

Monitoring and reporting control failure occurs when security-relevant behavior is not logged, surfaced, classified, or routed to the correct review path.

This failure mode is not a standalone OWASP LLM Top 10 category. It is a control-layer weakness. It can amplify other categories when prompt injection, unsafe output, excessive agency, or retrieval contamination occurs but is not detected.

This maps primarily to:

OWASP LLM01 when injection behavior is not detected.
OWASP LLM05 when unsafe output is not flagged.
OWASP LLM06 when tool actions are not reviewed.
NIST AI RMF governance, measurement, and management functions.

NIST AI 600-1 positions generative AI risk management across lifecycle stages and includes suggested actions for managing GAI risks. (NIST Publications)

The defensive principle: LLM applications need observability for prompt assembly, retrieval source, model output, validation result, tool selection, policy routing, and human approval.

9. Fail-Open Fallback Behavior

Fail-open fallback behavior occurs when validation, policy checking, retrieval, classification, or tool-selection logic fails, but the system continues with a permissive default.

This is not unique to LLM systems. It becomes more difficult to detect in LLM applications because the model may still produce a fluent and confident response even when the system lacked validated context.

Examples include:

retrieval failed, but the model answers anyway
validation failed, but the output is still sent downstream
risk classification is uncertain, but the system continues as low risk
tool selection fails, but the fallback tool has broader permissions
policy state is unavailable, but the response path remains permissive

This maps primarily to:

OWASP LLM05 if fallback output is consumed downstream.
OWASP LLM06 if fallback behavior enables action.
NIST AI RMF risk management and measurement controls.

The defensive principle: high-impact paths should fail closed. The system should distinguish between “the model produced an answer” and “the application had sufficient validated context to proceed.”

10. Synthetic Risk-Routing Manipulation

Synthetic risk-routing manipulation occurs when language causes the system to over-escalate, under-escalate, or misclassify an interaction because the model or classifier reacts to surface-level signals rather than validated context.

This failure mode is best treated as a testing category, not as a proven vulnerability class by default.

It is relevant for systems that use LLMs or classifiers to assign:

risk level
policy category
review queue
escalation route
safety mode
response template
tool availability

This maps primarily to:

OWASP LLM01 if user input changes model behavior.
OWASP LLM05 if the model-generated classification is consumed downstream.
NIST AI RMF measurement and management functions when classification quality affects risk controls.

The defensive principle: risk classification should be evaluated for false negatives, false positives, routing drift, and evidence quality. The goal is not to maximize blocking. The goal is to route the right event to the right control.

Mapping table

Failure mode	Primary boundary	OWASP mapping	NIST mapping	Control objective
Format-Boundary Prompt Injection	Format / schema	LLM01, LLM05	Measure / Manage	Validate semantics, not only syntax
System Prompt Exposure Attempt	Instruction	LLM07, LLM01	Govern / Manage	Keep secrets and authorization outside prompts
Session-State Boundary Reset	Session / memory	LLM01, LLM06	Map / Manage	Enforce state outside the model
Schema-Constrained Prompt Injection	Schema / output	LLM01, LLM05, LLM06	Measure / Manage	Validate structured output before use
Risk-Routing Manipulation	Classification / routing	LLM01, LLM05	Measure / Manage	Use independent routing controls
Echoed Output Contamination	Output	LLM05, LLM01	Measure / Manage	Treat model output as untrusted
Retrieved Context Contamination	Retrieval / context	LLM01, LLM08	Map / Measure	Separate retrieved data from instructions
Monitoring and Reporting Control Failure	Observability	LLM01, LLM05, LLM06	Govern / Measure / Manage	Log and surface security-relevant behavior
Fail-Open Fallback Behavior	Fallback / control flow	LLM05, LLM06	Manage	Fail closed on uncertainty
Synthetic Risk-Routing Manipulation	Risk routing	LLM01, LLM05	Measure / Manage	Test routing quality and evidence alignment

Defensive architecture implications

The defensive answer is not to write a stronger hidden prompt. A stronger system prompt may improve behavior, but it should not be treated as the security boundary.

The stronger architecture is layered.

1. Separate trusted instructions from untrusted content

The application should maintain a clear separation between:

system and developer instructions
user input
retrieved documents
tool output
prior conversation
model-generated text

The model may receive all of these as text, but the application should not treat all text as equal authority.

2. Keep secrets and authorization outside prompts

OWASP explicitly recommends avoiding sensitive data in system prompts and enforcing critical controls such as privilege separation and authorization checks independently from the LLM. (OWASP Gen AI Security Project)

System prompts should not contain:

credentials
connection strings
API keys
database names when sensitive
permission structures
operational secrets
hidden authorization logic

3. Validate output before downstream use

LLM output should not be passed directly into execution contexts, rendering contexts, workflow engines, tool calls, or database operations without validation.

The validation should be specific to the target context. HTML rendering, SQL generation, file writing, email sending, policy routing, and API calls require different controls.

4. Limit agency

For agentic systems, the highest-risk question is not whether the model can answer incorrectly. It is what the system allows the model to do after answering.

OWASP identifies excessive functionality, excessive permissions, and excessive autonomy as root causes of excessive agency. (OWASP Gen AI Security Project)

Defensive design should include:

narrow tool scope
least privilege
no unnecessary extensions
no open-ended shell/browser/database tools where avoidable
human approval for high-impact actions
separate agents or services for different privilege levels

5. Treat retrieved content as untrusted

RAG does not eliminate prompt injection risk. OWASP states that RAG and fine-tuning may improve relevance and accuracy, but they do not fully mitigate prompt-injection vulnerabilities. (OWASP Gen AI Security Project)

Retrieved content should be:

access-controlled
source-attributed
scoped to the user and task
separated from instructions
filtered for injection-like content
excluded from tool-control decisions unless independently validated

6. Make routing auditable

If the model assigns a risk label, policy category, tool route, escalation state, or workflow decision, that decision should be observable.

A secure review path should be able to answer:

what input was used
what retrieved context was included
what instruction hierarchy was applied
what output was generated
what validator accepted or rejected it
what tool was selected
what policy path was triggered
what human approval was required

7. Fail closed on uncertainty

A model can generate fluent output even when the system lacks evidence. That is a product behavior issue and a security issue.

High-impact workflows should not proceed when:

retrieval fails
source confidence is insufficient
validation fails
policy status is unknown
session state is inconsistent
tool arguments cannot be verified
user authorization cannot be confirmed

The system should expose uncertainty to deterministic controls instead of converting it into a normal answer.

Architecture review checklist

Area	Review question
Prompt assembly	Which parts of the final prompt are trusted instructions and which are untrusted data?
Instruction hierarchy	Can user or retrieved text override system/developer instructions?
System prompt	Does the system prompt contain secrets, permissions, credentials, or internal security logic?
Retrieval	Can external content influence instructions, tool calls, or policy decisions?
Schema	Does the schema validate only shape, or also semantic safety and downstream impact?
Output handling	Where does model output go after generation, and is it validated for that destination?
Tool use	What tools can the model call, and are they limited by least privilege?
Session state	Is identity, tenant, role, memory, or approval state enforced outside the model?
Risk routing	Can user-controlled language change severity, escalation, or control path?
Monitoring	Are suspicious retrievals, validation failures, tool calls, and routing decisions logged?
Fallbacks	Does the system fail closed when context, validation, or policy state is uncertain?

Practical review model

A useful review process is to inspect the LLM application as a request assembly and control-flow system.

For each user request, identify:

What the user provided.
What the application retrieved.
What the system inserted.
What the model generated.
What the application parsed.
What downstream system consumed.
What action became possible.
What was logged or escalated.

The security review should focus on the transitions:

user text → model instruction space
retrieved content → trusted context
model output → structured object
structured object → tool argument
model classification → policy route
fallback output → downstream workflow
untrusted content → stored memory

These transitions are where language becomes control.

Conclusion

Prompt injection is not only a prompt problem. It is a system boundary problem.

The model processes language. The application decides what that language can influence.

That distinction is the core security issue.

If untrusted text can alter instructions, contaminate retrieved context, shape structured output, reset session assumptions, trigger tools, affect risk routing, or bypass monitoring, the failure is architectural. It cannot be solved only by hiding stronger instructions inside the prompt.

OWASP provides the LLM application risk categories. NIST provides the broader risk-management frame. The boundary model connects both to implementation review.

The practical security standard is clear:

do not place secrets or authorization logic in prompts
do not treat model output as trusted
do not let retrieved text act as instruction
do not delegate high-impact decisions to unverified model output
do not give agents broad tools, permissions, or autonomy
do not let failures proceed silently

The model is not the security boundary.

The application architecture is.

Prompt Injection Is a Boundary Failure

Scope and method

Core thesis

Formal baseline

OWASP LLM01: Prompt Injection

OWASP LLM05: Improper Output Handling

OWASP LLM06: Excessive Agency

OWASP LLM07: System Prompt Leakage

OWASP LLM08: Vector and Embedding Weaknesses

Boundary failure modes

1. Format-Boundary Prompt Injection

2. System Prompt Exposure Attempt

3. Session-State Boundary Reset

4. Schema-Constrained Prompt Injection

5. Risk-Routing Manipulation

6. Echoed Output Contamination

7. Retrieved Context Contamination

8. Monitoring and Reporting Control Failure

9. Fail-Open Fallback Behavior

10. Synthetic Risk-Routing Manipulation

Mapping table

Defensive architecture implications

1. Separate trusted instructions from untrusted content

2. Keep secrets and authorization outside prompts

3. Validate output before downstream use

4. Limit agency

5. Treat retrieved content as untrusted

6. Make routing auditable

7. Fail closed on uncertainty

Architecture review checklist

Practical review model

Conclusion

Suggested reading

References

Scope and method

Core thesis

Formal baseline

OWASP LLM01: Prompt Injection

OWASP LLM05: Improper Output Handling

OWASP LLM06: Excessive Agency

OWASP LLM07: System Prompt Leakage

OWASP LLM08: Vector and Embedding Weaknesses

Boundary failure modes

1. Format-Boundary Prompt Injection

2. System Prompt Exposure Attempt

3. Session-State Boundary Reset

4. Schema-Constrained Prompt Injection

5. Risk-Routing Manipulation

6. Echoed Output Contamination

7. Retrieved Context Contamination

8. Monitoring and Reporting Control Failure

9. Fail-Open Fallback Behavior

10. Synthetic Risk-Routing Manipulation

Mapping table

Defensive architecture implications

1. Separate trusted instructions from untrusted content

2. Keep secrets and authorization outside prompts

3. Validate output before downstream use

4. Limit agency

5. Treat retrieved content as untrusted

6. Make routing auditable

7. Fail closed on uncertainty

Architecture review checklist

Practical review model

Conclusion

Suggested reading

References

Get new AI resources by email