The attack surface is the orchestration loop, not the model

By Published

What this page is: a vendor-agnostic threat model and control map for multi-step tool-using LLM systems.
What this page is not: a claim about any specific vendor trace or internal implementation.

Executive summary

When a system introduces an orchestration loop (a controller loop that repeatedly plan/act/observe/decide), the attack surface shifts from “the model” to the control plane that:

This page uses OWASP GenAI categories to structure risks and controls:

(References at the end.)


Core terms (as used here)

(Workflow vs agent distinctions are aligned with Anthropic and LangGraph; see References.)


Where the loop lives (control plane vs data plane)

A practical split for auditing is:

Introducing a loop increases the number of control-plane decisions per request; the control plane becomes the dominant trust boundary.

Single-shot tool use (minimal control plane)

[request + context] -> [model] -> (optional tool call) -> [model] -> [final output]

Orchestration loop (expanded control plane)

[request + context]
      |
      v
[orchestration loop] ---> [tool router] ---> [tools/connectors]
      |                       |
      v                       v
[PDP/PEP checks]          [tool output]
      |
      v
[stop / retry / next step]

Execution patterns: how risk shifts by orchestration pattern

The core differentiator is who decides the next tool/step and how many times that decision happens.

PatternOrchestration shapeWho decides the next tool/step?Dominant risk amplifiersPrimary enforcement points
Single-shot tool useOne/few tool calls, minimal iterationMostly application code + one model decisionUnsafe tool args; weak write-path enforcement; mishandled tool outputTool allowlists, strict tool schemas, output handling, server-side write-path enforcement
Workflow (predetermined)Fixed graph / code pathGraph logicReduced dynamism, but still exposed via tool I/O + data channelsGraph-level policy checks + tool contracts + write-path enforcement
ReAct-style loopIterative: reason → act → observe → repeatModel outputs + loop logic each turnRepeated exposure to untrusted observations; step chaining; stop-condition abuseStep-level PDP/PEP checks, tool-arg constraints, loop budgets, provenance
Plan-and-executePlan first; execute step-by-step; may re-planPlanner output + executor loopPlan becomes an attack target; execution drift; plan/tool couplingPlan validator + per-step PDP/PEP + write-path enforcement + budgets

Notes:


Minimal threat model

Attacker capability: can fully/partially control at least one untrusted artifact that the system ingests (prompt, retrieved page, ticket/file/email, or tool output).
Defender failure mode: the system treats parts of that artifact as control-plane authority (routing constraints, tool permissions, approvals, tool arguments, stop conditions).
Impact classes: (a) steering (wrong plan/tool), (b) exfiltration (retrieve/export), (c) unauthorized write (state change), often with (d) audit failure (missing provenance/correlation).


Why risk scales in an orchestration loop (mechanisms)

1) Indirect prompt injection becomes an execution channel (LLM01)

Indirect injection is instruction-like text embedded in external sources (web/files/tickets). In a loop, the same injected directive can influence multiple steps unless instruction/data separation and PDP/PEP enforcement are real (not “prompt-only”).

2) Plans and intermediate artifacts become attack targets

Plans, step lists, tool observations, and “notes so far” become decision inputs. If attackers can influence those artifacts (via retrieval or tool output), they can steer execution toward exfiltration or unsafe writes.

3) Retry amplification multiplies exposure

Retries improve reliability but increase the number of opportunities for compromised observations/summaries to influence downstream decisions.

4) Stop-condition and budget manipulation creates “eventual side effects”

Stop/retry logic is control-plane authority. Without enforced stop/budget rules, the loop tends to “search” until it eventually crosses a side-effect boundary.

5) Unbounded consumption becomes a security risk (LLM10)

In looped systems, budgets (steps/time/cost/retries) are a first-class security control for availability and cost containment.


Controls that reduce orchestration-loop risk (enforcement points)

1) Instruction/data separation (treat retrieval + tool outputs as untrusted)

Enforce:

2) Capability-scoped tools (least privilege) (LLM06)

3) Server-side write-path enforcement (PDP → PEP per call)

Treat any external side effect as privileged. Authorize and enforce server-side per write call.

4) Strict tool schemas + validation

Validate tool selection + arguments in code (schema + semantic constraints). Do not treat model output as intrinsically valid authority.

5) Tool selection constraints (reduce action space)

Constrain selectable tools per run/intent category (deny-by-default) where the platform supports it.

6) Loop budgets + enforced stop conditions (LLM10)

Hard limits the loop cannot bypass:

7) End-to-end provenance and auditability

Log enough to reconstruct “what influenced what”:


Implementation sketch (PDP/PEP + budgets)

Illustrative pseudocode showing enforcement points that must not depend on model compliance:

ingress(request):
  principal = authenticate_and_bind(request)        # tenant/user/session binding
  mode      = decide_mode(principal, request)       # READ_ONLY / WRITE_CAPABLE

  retrieved = retrieve(request)
  retrieved = label_and_delimit(retrieved, trust="UNTRUSTED_DATA")

  plan = model_propose_plan(request, retrieved)     # untrusted intermediate artifact

  decision = pdp_validate_plan(plan, principal, mode)
  assert decision.allow

  budgets = { max_steps:N, max_tool_calls:M, max_retries_per_tool:R, timeout_ms:T }
  state   = init_state(request, retrieved, decision)

  while budgets_ok(budgets, state):
    action = model_propose_next_action(state)       # untrusted proposal

    assert pep_validate_action(action, principal, mode, state)

    if action.type == STOP:
      return finalize(state)

    assert action.type == TOOL_CALL
    assert validate_tool_call(action, principal, state)  # schema + semantics

    if action.is_write:
      assert server_side_write_gate(action, principal, state)

    result = execute_tool(action)
    record_provenance(action, result)
    state = update_state(state, result)

What to test (security test cases)

Test Goal Expected result
Retrieval injection test Retrieved content cannot change tool allowlists, scopes, or authorization decisions Tool selection remains constrained; policy gates reject elevation
Tool-arg validation test Invalid or policy-violating arguments are rejected Fail closed; violations logged with provenance
Write-gate test Any write-capable call requires server-side authorization per call Writes blocked without explicit authorization decision
Plan validation test Planner cannot introduce tools/targets outside policy/tenant binding Plan rejected or constrained by policy, not executed
Budget/stop test Loops terminate under step/time/cost/retry ceilings Loop stops deterministically; escalation rules trigger

Suggested reading

References