The attack surface is the orchestration loop, not the model
What this page is: a vendor-agnostic threat model and control map for multi-step tool-using LLM systems.
What this page is not: a claim about any specific vendor trace or internal implementation.
Executive summary
When a system introduces an orchestration loop (a controller loop that repeatedly plan/act/observe/decide), the attack surface shifts from “the model” to the control plane that:
- selects plans and tools,
- carries state across steps,
- decides stop/retry,
- and can cross into write paths (side effects).
This page uses OWASP GenAI categories to structure risks and controls:
- LLM01 Prompt Injection (including indirect injection via retrieved/tool text),
- LLM06 Excessive Agency (capabilities/permissions exceed what is justified),
- LLM10 Unbounded Consumption (loop-driven cost/availability risk).
(References at the end.)
Core terms (as used here)
- Orchestration loop (controller loop): control logic that sequences steps (plan → act → observe → decide), including retries and stop conditions.
- Workflow: LLM + tools executed through predefined code paths (fixed order / graph).
- Agent: the model can dynamically propose its own steps and tool usage at runtime (within enforced bounds).
- Policy decision point (PDP): evaluates a proposed plan/action against policy and returns allow/deny.
- Policy enforcement point (PEP): enforces allow/deny before any side effect (e.g., before tool calls/writes).
- Instruction/data separation: prevents untrusted content from being interpreted as policy, permissions, routing constraints, approvals, or tool arguments.
- Write path: any operation that changes external state (create/update/delete, permissions, exports, configuration changes).
- Loop budgets: hard limits on steps/time/cost/retries that cannot be bypassed by the loop.
(Workflow vs agent distinctions are aligned with Anthropic and LangGraph; see References.)
Where the loop lives (control plane vs data plane)
A practical split for auditing is:
- Data plane: user input, retrieved text, tool observations (what the model reads).
- Control plane: orchestration + policy decision/enforcement logic (routing, tool selection, stop/retry, write-path enforcement).
Introducing a loop increases the number of control-plane decisions per request; the control plane becomes the dominant trust boundary.
Single-shot tool use (minimal control plane)
[request + context] -> [model] -> (optional tool call) -> [model] -> [final output]
Orchestration loop (expanded control plane)
[request + context]
|
v
[orchestration loop] ---> [tool router] ---> [tools/connectors]
| |
v v
[PDP/PEP checks] [tool output]
|
v
[stop / retry / next step]
Execution patterns: how risk shifts by orchestration pattern
The core differentiator is who decides the next tool/step and how many times that decision happens.
| Pattern | Orchestration shape | Who decides the next tool/step? | Dominant risk amplifiers | Primary enforcement points |
|---|---|---|---|---|
| Single-shot tool use | One/few tool calls, minimal iteration | Mostly application code + one model decision | Unsafe tool args; weak write-path enforcement; mishandled tool output | Tool allowlists, strict tool schemas, output handling, server-side write-path enforcement |
| Workflow (predetermined) | Fixed graph / code path | Graph logic | Reduced dynamism, but still exposed via tool I/O + data channels | Graph-level policy checks + tool contracts + write-path enforcement |
| ReAct-style loop | Iterative: reason → act → observe → repeat | Model outputs + loop logic each turn | Repeated exposure to untrusted observations; step chaining; stop-condition abuse | Step-level PDP/PEP checks, tool-arg constraints, loop budgets, provenance |
| Plan-and-execute | Plan first; execute step-by-step; may re-plan | Planner output + executor loop | Plan becomes an attack target; execution drift; plan/tool coupling | Plan validator + per-step PDP/PEP + write-path enforcement + budgets |
Notes:
- ReAct is a published research pattern (paper in References).
- Workflow vs agent distinctions are defined by Anthropic and LangGraph (References).
Minimal threat model
Attacker capability: can fully/partially control at least one untrusted artifact that the system ingests (prompt, retrieved page, ticket/file/email, or tool output).
Defender failure mode: the system treats parts of that artifact as control-plane authority (routing constraints, tool permissions, approvals, tool arguments, stop conditions).
Impact classes: (a) steering (wrong plan/tool), (b) exfiltration (retrieve/export), (c) unauthorized write (state change), often with (d) audit failure (missing provenance/correlation).
Why risk scales in an orchestration loop (mechanisms)
1) Indirect prompt injection becomes an execution channel (LLM01)
Indirect injection is instruction-like text embedded in external sources (web/files/tickets). In a loop, the same injected directive can influence multiple steps unless instruction/data separation and PDP/PEP enforcement are real (not “prompt-only”).
2) Plans and intermediate artifacts become attack targets
Plans, step lists, tool observations, and “notes so far” become decision inputs. If attackers can influence those artifacts (via retrieval or tool output), they can steer execution toward exfiltration or unsafe writes.
3) Retry amplification multiplies exposure
Retries improve reliability but increase the number of opportunities for compromised observations/summaries to influence downstream decisions.
4) Stop-condition and budget manipulation creates “eventual side effects”
Stop/retry logic is control-plane authority. Without enforced stop/budget rules, the loop tends to “search” until it eventually crosses a side-effect boundary.
5) Unbounded consumption becomes a security risk (LLM10)
In looped systems, budgets (steps/time/cost/retries) are a first-class security control for availability and cost containment.
Controls that reduce orchestration-loop risk (enforcement points)
1) Instruction/data separation (treat retrieval + tool outputs as untrusted)
Enforce:
- Retrieved/tool text is data, not policy.
- Tool selection and write approvals cannot be derived from untrusted text.
- Actionable directives must use structured, allowlisted formats.
2) Capability-scoped tools (least privilege) (LLM06)
- Separate read vs write capabilities.
- Bind scopes to explicit resources/tenants.
- Deny-by-default exposure of high-impact tools.
3) Server-side write-path enforcement (PDP → PEP per call)
Treat any external side effect as privileged. Authorize and enforce server-side per write call.
4) Strict tool schemas + validation
Validate tool selection + arguments in code (schema + semantic constraints). Do not treat model output as intrinsically valid authority.
5) Tool selection constraints (reduce action space)
Constrain selectable tools per run/intent category (deny-by-default) where the platform supports it.
6) Loop budgets + enforced stop conditions (LLM10)
Hard limits the loop cannot bypass:
- max steps / max tool calls
- time budget / retry caps per tool
- cost ceilings
- escalation rules after repeated failures (stop / approval / read-only)
7) End-to-end provenance and auditability
Log enough to reconstruct “what influenced what”:
- retrieved artifacts (stable references/hashes)
- plan + step history
- tool I/O (bounded/redacted)
- approvals/denials and PDP/PEP decisions
- final output
Implementation sketch (PDP/PEP + budgets)
Illustrative pseudocode showing enforcement points that must not depend on model compliance:
ingress(request):
principal = authenticate_and_bind(request) # tenant/user/session binding
mode = decide_mode(principal, request) # READ_ONLY / WRITE_CAPABLE
retrieved = retrieve(request)
retrieved = label_and_delimit(retrieved, trust="UNTRUSTED_DATA")
plan = model_propose_plan(request, retrieved) # untrusted intermediate artifact
decision = pdp_validate_plan(plan, principal, mode)
assert decision.allow
budgets = { max_steps:N, max_tool_calls:M, max_retries_per_tool:R, timeout_ms:T }
state = init_state(request, retrieved, decision)
while budgets_ok(budgets, state):
action = model_propose_next_action(state) # untrusted proposal
assert pep_validate_action(action, principal, mode, state)
if action.type == STOP:
return finalize(state)
assert action.type == TOOL_CALL
assert validate_tool_call(action, principal, state) # schema + semantics
if action.is_write:
assert server_side_write_gate(action, principal, state)
result = execute_tool(action)
record_provenance(action, result)
state = update_state(state, result)
What to test (security test cases)
| Test | Goal | Expected result |
|---|---|---|
| Retrieval injection test | Retrieved content cannot change tool allowlists, scopes, or authorization decisions | Tool selection remains constrained; policy gates reject elevation |
| Tool-arg validation test | Invalid or policy-violating arguments are rejected | Fail closed; violations logged with provenance |
| Write-gate test | Any write-capable call requires server-side authorization per call | Writes blocked without explicit authorization decision |
| Plan validation test | Planner cannot introduce tools/targets outside policy/tenant binding | Plan rejected or constrained by policy, not executed |
| Budget/stop test | Loops terminate under step/time/cost/retry ceilings | Loop stops deterministically; escalation rules trigger |
Suggested reading
- The Attack Surface Starts Before Agents — The LLM Boundary
- How Agentic Control-Plane Failures Actually Happen
- Agentic Systems: 8 Trust-Boundary Audit Checkpoints
- Request assembly threat model: reading the diagram
- Engineering Quality Gate — Procedure
References
- OWASP GenAI — LLM01:2025 Prompt Injection
- OWASP GenAI — LLM06:2025 Excessive Agency
- OWASP GenAI — LLM10:2025 Unbounded Consumption
- OWASP Cheat Sheet — LLM Prompt Injection Prevention
- OWASP Cheat Sheet — AI Agent Security
- NIST SP 800-207 — Zero Trust Architecture (DOI)
- NIST SP 800-207 — Zero Trust Architecture (PDF)
- NIST AI 600-1 — AI RMF: Generative AI Profile (DOI)
- NIST AI 600-1 — AI RMF: Generative AI Profile (PDF)
- OpenAI — Safety in building agents
- OpenAI — Function calling
- OpenAI — Structured outputs
- Anthropic — Building Effective Agents
- LangGraph — Workflows and agents
- ReAct — Yao et al. (arXiv:2210.03629)