The attack surface is the orchestration loop, not the model

By Tamar Peretz Published 2026-02-22

What this page is: a vendor-agnostic threat model and control map for multi-step tool-using LLM systems.
What this page is not: a claim about any specific vendor trace or internal implementation.

Executive summary

When a system introduces an orchestration loop (a controller loop that repeatedly plan/act/observe/decide), the attack surface shifts from “the model” to the control plane that:

selects plans and tools,
carries state across steps,
decides stop/retry,
and can cross into write paths (side effects).

This page uses OWASP GenAI categories to structure risks and controls:

LLM01 Prompt Injection (including indirect injection via retrieved/tool text),
LLM06 Excessive Agency (capabilities/permissions exceed what is justified),
LLM10 Unbounded Consumption (loop-driven cost/availability risk).

(References at the end.)

Core terms (as used here)

Orchestration loop (controller loop): control logic that sequences steps (plan → act → observe → decide), including retries and stop conditions.
Workflow: LLM + tools executed through predefined code paths (fixed order / graph).
Agent: the model can dynamically propose its own steps and tool usage at runtime (within enforced bounds).
Policy decision point (PDP): evaluates a proposed plan/action against policy and returns allow/deny.
Policy enforcement point (PEP): enforces allow/deny before any side effect (e.g., before tool calls/writes).
Instruction/data separation: prevents untrusted content from being interpreted as policy, permissions, routing constraints, approvals, or tool arguments.
Write path: any operation that changes external state (create/update/delete, permissions, exports, configuration changes).
Loop budgets: hard limits on steps/time/cost/retries that cannot be bypassed by the loop.

(Workflow vs agent distinctions are aligned with Anthropic and LangGraph; see References.)

Where the loop lives (control plane vs data plane)

A practical split for auditing is:

Data plane: user input, retrieved text, tool observations (what the model reads).
Control plane: orchestration + policy decision/enforcement logic (routing, tool selection, stop/retry, write-path enforcement).

Introducing a loop increases the number of control-plane decisions per request; the control plane becomes the dominant trust boundary.

Single-shot tool use (minimal control plane)

[request + context] -> [model] -> (optional tool call) -> [model] -> [final output]

Orchestration loop (expanded control plane)

[request + context]
      |
      v
[orchestration loop] ---> [tool router] ---> [tools/connectors]
      |                       |
      v                       v
[PDP/PEP checks]          [tool output]
      |
      v
[stop / retry / next step]

Execution patterns: how risk shifts by orchestration pattern

The core differentiator is who decides the next tool/step and how many times that decision happens.

Pattern	Orchestration shape	Who decides the next tool/step?	Dominant risk amplifiers	Primary enforcement points
Single-shot tool use	One/few tool calls, minimal iteration	Mostly application code + one model decision	Unsafe tool args; weak write-path enforcement; mishandled tool output	Tool allowlists, strict tool schemas, output handling, server-side write-path enforcement
Workflow (predetermined)	Fixed graph / code path	Graph logic	Reduced dynamism, but still exposed via tool I/O + data channels	Graph-level policy checks + tool contracts + write-path enforcement
ReAct-style loop	Iterative: reason → act → observe → repeat	Model outputs + loop logic each turn	Repeated exposure to untrusted observations; step chaining; stop-condition abuse	Step-level PDP/PEP checks, tool-arg constraints, loop budgets, provenance
Plan-and-execute	Plan first; execute step-by-step; may re-plan	Planner output + executor loop	Plan becomes an attack target; execution drift; plan/tool coupling	Plan validator + per-step PDP/PEP + write-path enforcement + budgets

Notes:

ReAct is a published research pattern (paper in References).
Workflow vs agent distinctions are defined by Anthropic and LangGraph (References).

Minimal threat model

Attacker capability: can fully/partially control at least one untrusted artifact that the system ingests (prompt, retrieved page, ticket/file/email, or tool output).
Defender failure mode: the system treats parts of that artifact as control-plane authority (routing constraints, tool permissions, approvals, tool arguments, stop conditions).
Impact classes: (a) steering (wrong plan/tool), (b) exfiltration (retrieve/export), (c) unauthorized write (state change), often with (d) audit failure (missing provenance/correlation).

Why risk scales in an orchestration loop (mechanisms)

1) Indirect prompt injection becomes an execution channel (LLM01)

Indirect injection is instruction-like text embedded in external sources (web/files/tickets). In a loop, the same injected directive can influence multiple steps unless instruction/data separation and PDP/PEP enforcement are real (not “prompt-only”).

2) Plans and intermediate artifacts become attack targets

Plans, step lists, tool observations, and “notes so far” become decision inputs. If attackers can influence those artifacts (via retrieval or tool output), they can steer execution toward exfiltration or unsafe writes.

3) Retry amplification multiplies exposure

Retries improve reliability but increase the number of opportunities for compromised observations/summaries to influence downstream decisions.

4) Stop-condition and budget manipulation creates “eventual side effects”

Stop/retry logic is control-plane authority. Without enforced stop/budget rules, the loop tends to “search” until it eventually crosses a side-effect boundary.

5) Unbounded consumption becomes a security risk (LLM10)

In looped systems, budgets (steps/time/cost/retries) are a first-class security control for availability and cost containment.

Controls that reduce orchestration-loop risk (enforcement points)

1) Instruction/data separation (treat retrieval + tool outputs as untrusted)

Enforce:

Retrieved/tool text is data, not policy.
Tool selection and write approvals cannot be derived from untrusted text.
Actionable directives must use structured, allowlisted formats.

2) Capability-scoped tools (least privilege) (LLM06)

Separate read vs write capabilities.
Bind scopes to explicit resources/tenants.
Deny-by-default exposure of high-impact tools.

3) Server-side write-path enforcement (PDP → PEP per call)

Treat any external side effect as privileged. Authorize and enforce server-side per write call.

4) Strict tool schemas + validation

Validate tool selection + arguments in code (schema + semantic constraints). Do not treat model output as intrinsically valid authority.

5) Tool selection constraints (reduce action space)

Constrain selectable tools per run/intent category (deny-by-default) where the platform supports it.

6) Loop budgets + enforced stop conditions (LLM10)

Hard limits the loop cannot bypass:

max steps / max tool calls
time budget / retry caps per tool
cost ceilings
escalation rules after repeated failures (stop / approval / read-only)

7) End-to-end provenance and auditability

Log enough to reconstruct “what influenced what”:

retrieved artifacts (stable references/hashes)
plan + step history
tool I/O (bounded/redacted)
approvals/denials and PDP/PEP decisions
final output

Implementation sketch (PDP/PEP + budgets)

Illustrative pseudocode showing enforcement points that must not depend on model compliance:

ingress(request):
  principal = authenticate_and_bind(request)        # tenant/user/session binding
  mode      = decide_mode(principal, request)       # READ_ONLY / WRITE_CAPABLE

  retrieved = retrieve(request)
  retrieved = label_and_delimit(retrieved, trust="UNTRUSTED_DATA")

  plan = model_propose_plan(request, retrieved)     # untrusted intermediate artifact

  decision = pdp_validate_plan(plan, principal, mode)
  assert decision.allow

  budgets = { max_steps:N, max_tool_calls:M, max_retries_per_tool:R, timeout_ms:T }
  state   = init_state(request, retrieved, decision)

  while budgets_ok(budgets, state):
    action = model_propose_next_action(state)       # untrusted proposal

    assert pep_validate_action(action, principal, mode, state)

    if action.type == STOP:
      return finalize(state)

    assert action.type == TOOL_CALL
    assert validate_tool_call(action, principal, state)  # schema + semantics

    if action.is_write:
      assert server_side_write_gate(action, principal, state)

    result = execute_tool(action)
    record_provenance(action, result)
    state = update_state(state, result)

What to test (security test cases)

Test	Goal	Expected result
Retrieval injection test	Retrieved content cannot change tool allowlists, scopes, or authorization decisions	Tool selection remains constrained; policy gates reject elevation
Tool-arg validation test	Invalid or policy-violating arguments are rejected	Fail closed; violations logged with provenance
Write-gate test	Any write-capable call requires server-side authorization per call	Writes blocked without explicit authorization decision
Plan validation test	Planner cannot introduce tools/targets outside policy/tenant binding	Plan rejected or constrained by policy, not executed
Budget/stop test	Loops terminate under step/time/cost/retry ceilings	Loop stops deterministically; escalation rules trigger