The attack surface starts before agents: the LLM integration trust boundary
What this page is: a vendor-agnostic threat-modeling worksheet for the earliest trust boundary where LLM I/O can touch production systems.
What this page is not: a claim about LLM internals or any specific vendor trace.
Executive summary
Before you adopt an agent framework, many high-leverage security controls are decided at the LLM integration trust boundary: the first interface where model inputs/outputs can (a) read production data, (b) write to production systems, or (c) enter production observability (logs/telemetry/traces).
This page treats that interface as a trust boundary and maps it to OWASP GenAI LLM Top 10 (2025) risk categories.
How to use this worksheet
1) Fill the read paths table with every source that can enter context (including retrieval and tool outputs).
2) Fill the write paths table with every sink where model outputs can land (including observability and persistence).
3) For each boundary crossing, record the owner, the server-side enforcement point, and the minimum audit evidence required to reconstruct an incident.
Scope and evidence boundary
This article is a threat-modeling guide for a specific control point before agent frameworks: the earliest interface where LLM I/O can touch production data or production observability.
It does not claim mechanism-level properties about LLMs. Where risk categories are referenced, they are pinned to OWASP GenAI LLM Top 10 (2025) (see References).
Definition: the LLM integration trust boundary
LLM integration trust boundary = the first interface where LLM inputs/outputs can read from or write to:
- production data stores, or
- production observability systems (logs / telemetry / traces), or
- tool/API surfaces that can cause side effects.
Operationally, this is the trust boundary between model I/O and production systems.
Why this boundary matters (even without agents)
OWASP’s LLM Top 10 (2025) includes risks that apply to non-agent LLM apps when model I/O is connected to real systems:
- LLM01:2025 Prompt Injection — untrusted inputs steer behavior/output.
- LLM02:2025 Sensitive Information Disclosure — sensitive data leaks via outputs or context handling.
- LLM05:2025 Improper Output Handling — unsafe downstream consumption of model outputs.
- LLM06:2025 Excessive Agency — the system grants tools/permissions/autonomy beyond minimum needed.
Threat scenarios at the trust boundary (protocol-level)
Scenario A — indirect prompt injection via external content + tool access
If the system ingests untrusted content (email/docs/web) and also enables tool calls, an attacker can place instruction-like payloads in that content.
OWASP’s Excessive Agency guidance includes mailbox-assistant scenarios where untrusted inputs can trigger sensitive-data access and exfiltration. Mitigations include minimizing extensions, least-privilege scopes, and requiring user approval for high-impact actions.
Scenario B — improper output handling downstream
If model output is passed into:
- command execution,
- templating/HTML rendering,
- policy/routing decisions,
- database writes, without strict validation/sanitization, the output becomes an injection surface (even if the user never sees it).
Scenario C — sensitive data exposure via stored artifacts
If sensitive inputs/outputs are stored (logs/telemetry/analytics/memory/RAG indexes), the exposure surface includes retention, access control, and replay into future prompts.
Mapping worksheet
1) Read paths into the model (inputs)
Document every source that can enter context:
| Source | Trust level | Sensitivity | Transformations before model | Notes |
|---|---|---|---|---|
| User message | Untrusted | Varies | Redaction? | |
| Retrieved docs / web | Untrusted | Varies | Filtering / allowlist | |
| Tickets/CRM/email summaries | Untrusted (default) | Often sensitive | Redaction + minimization | |
| Database reads | Trusted (system) | Often sensitive | Field-level selection | |
| Tool outputs (if re-injected) | Untrusted (default) | Varies | Sanitization + provenance tags |
2) Write paths from the model (outputs)
Document where outputs can land:
| Sink | Persisted? | Retention/TTL | Readers | Replay into prompts? | Controls |
|---|---|---|---|---|---|
| Product UI | No/Yes | — | End user | Maybe | Output policies |
| Logs / telemetry / traces | Yes | Defined TTL | Operators | Possible | Redaction + access controls |
| Analytics events | Yes | Defined TTL | Analysts | Possible | Minimization |
| Memory / context store | Yes | Defined TTL | System | Yes | Scoped + gated writes |
| Tools / internal APIs | Yes | — | Systems | — | Server-side authz + validation |
| Routing / feature flags | Yes | — | System | Yes | Deterministic gating |
3) Owner, enforcement, and audit evidence
For each boundary crossing, record:
- Owner (accountable control owner),
- Server-side enforcement (not “prompted”),
- Audit evidence (minimum data required to reconstruct an incident).
Minimum controls at the trust boundary (vendor-agnostic)
Control 1 — data policy for model-visible content (enforced)
Define:
- what the model may see,
- what must be redacted/minimized,
- what can be stored, where, and for how long,
- what can be replayed into future prompts.
Control 2 — server-side authorization and validation for any side effects
If outputs can influence tools, writes, routing, or flags:
- enforce authorization + validation outside the prompt,
- minimize permissions and available functions,
- require user review/approval for high-impact actions.
Control 3 — treat retrieved content and tool outputs as untrusted data
- maintain a strict instruction hierarchy (policy/controller > tool data > user data),
- constrain untrusted content so it cannot act as policy or permissions,
- attach provenance (source, time, workflow) when re-injecting content into context.
Control 4 — auditability without over-collection
You should be able to reconstruct:
- what entered context (at least pointers + provenance),
- what output was produced,
- what actions were requested/performed/blocked,
- why.
What you should have when done
- A complete inventory of context inputs (sources + trust level + transformations).
- A complete inventory of output sinks (persistence + retention + readers + replay risk).
- A list of server-side enforcement points for side effects (authz + validation + least privilege).
- A minimum audit evidence set sufficient to reconstruct a timeline (inputs → decisions → outputs → actions).
Copy/paste checklist
- I can point to the first place model I/O touches production data, observability, or tools.
- Every context source is classified (trust + sensitivity) and transformed before ingestion.
- Every output sink is documented (persistence, retention, readers, replay risk).
- Side-effect actions are gated server-side (authz + validation + least privilege).
- High-impact actions require explicit review/approval.
- Audit evidence exists to reconstruct an incident timeline.
Suggested reading
- The attack surface is the orchestration loop, not the model
- Request assembly threat model: reading the diagram
- How Agentic Control-Plane Failures Actually Happen
- Engineering Quality Gate — Procedure
- Content map
References (pinned)
OWASP GenAI (2025):
- LLM Top 10 index (2025)
- LLM01:2025 Prompt Injection
- LLM02:2025 Sensitive Information Disclosure
- LLM05:2025 Improper Output Handling
- LLM06:2025 Excessive Agency
OWASP (legacy v1.1 — numbering differs from the GenAI 2025 list):
OWASP cheat sheets:
OpenAI:
NIST: