Skip to the content.

Thesis

In tool-using systems, the primary attack surface is not the language model itself. The risk scales in the controller loop (the orchestration layer that plans, calls tools, evaluates outcomes, retries, and decides when to stop).

Two execution shapes (single-shot vs. loop)

Many deployments have a “tool-use” capability. The security shift happens when tool-use is wrapped in a multi-step control loop.

Single-shot tool use (one tool call, minimal iteration)

Typical shape:

Agentic tool use (planner/controller loop with retries)

Typical shape:

Key point: as soon as you introduce a loop, you introduce more decision points, more tool invocations, and more opportunities for steering.

Why risk scales in the loop

1) Indirect prompt injection gains leverage

When untrusted content is retrieved (web pages, files, tickets, emails), it can function as an instruction channel unless you enforce instruction/data separation. In a loop, that “instruction” can be acted on repeatedly across steps.

2) Plan manipulation becomes a target

If an attacker can influence the plan (explicitly or via tool outputs), they can redirect execution toward unsafe actions (e.g., exfiltration, privilege abuse, unintended writes).

3) Retry amplification (“recursive tool abuse”)

Retries are a safety feature for robustness—but they can amplify a single bad step. One compromised tool output can get re-used or reinforced across iterations.

4) Stop-condition hijack

The stop condition is a control point. If the system can be made to “continue until done” without strict gating, an attacker can push the loop to keep acting until a side effect occurs.

Practical failure pattern: untrusted content → instruction channel → chained actions

A common pattern is:

1) system retrieves untrusted content (web/file) 2) content includes instruction-like text 3) controller loop treats it as actionable 4) loop chains actions across tools (especially external writes)

This is a form of indirect prompt injection.

Controls that matter (not “better prompting”)

1) Scoped capabilities (least privilege, short-lived)

Use least privilege per tool/action. Prefer short-lived, scoped authorization (capability-style grants) over broad, long-lived permissions.

2) Server-side gates for external writes

Treat external side effects as high-risk operations:

Enforce server-side authorization checks and policy gates for each write.

3) Treat retrieved content as untrusted by default

Implement instruction/data separation:

4) Full audit trail (end-to-end provenance)

Log (and retain) enough to reconstruct what happened:

Without this, incident response and debugging are guesswork.

Minimal checklist