LLM-Led vs Orchestrator-Led Tool Execution Control-Plane Placement Tradeoffs

By Published

Abstract

When an LLM can call tools, the core architecture decision is where execution authority lives: inside the model loop, or outside it in an orchestrator/control plane.

This article compares two patterns:

Security and policy enforcement are one dimension of the broader tradeoff set (reliability, debugging, observability/auditability, latency, and cost governance).

Definitions (used in this article)

If you use PDP/PEP terminology, ensure you define it (and cite your chosen reference):

(NIST SP 800-207 defines a Zero Trust Architecture (ZTA) access model with policy decision and policy enforcement terminology; see References [8][9].)

Two reference architectures

A) LLM-led execution (authority in the model loop)

LLM-led execution: authority in the model loop
Figure 1 — LLM-led execution (authority in the model loop).

Conceptual flow (minimal):

request/context -> model -> tool_call(args) -> execute_tool -> observation -> model -> output

Where enforcement typically sits:

Implication (bounded): If the application executes model-proposed tool calls with weak external gates, argument manipulation and injection-to-action failures have a short path to side effects.

B) Orchestrator-led execution (authority outside the model; control-plane gating)

Orchestrator-led execution: authority outside the model; control-plane gating
Figure 2 — Orchestrator-led execution (authority outside the model).

Conceptual flow (minimal):

request/context -> model proposes intent/step -> orchestrator validates/authorizes -> execute_tool -> observation -> model -> next step

Where enforcement typically sits:

Implication (bounded): This placement makes it feasible to enforce allowlists, budgets, and write gates independent of model compliance, because enforcement occurs in code before side effects.

Why this is an architecture decision (not a prompting decision)

Prompt text can express intent, but it is not (by itself) an enforcement boundary. When tools have side effects, failures that would otherwise produce incorrect text can translate into incorrect actions. That is why authority placement is a primary design axis.

Control-plane placement: what changes operationally

A practical way to compare patterns is to ask four questions:

1) Who selects the next step/tool?
2) Where is allow/deny enforced (before side effects)?
3) Where are budgets enforced (steps/time/cost/retries/tool calls)?
4) What evidence is recorded for postmortems (proposed → validated → executed)?

Enforcement-point map (control plane vs data plane)

FunctionLLM-led execution (typical placement)Orchestrator-led execution (typical placement)
Tool selectionModel outputOrchestrator applies allowlist + intent gate
Argument validationSometimes weak or partialEnforced in orchestrator (schema + semantic constraints)
Authorization for writesOften prompt-driven or per-toolEnforced server-side per call (write gate)
Budgets (steps/time/cost)Harder to centralize inside the loopNatural centralization point in orchestrator
Audit trailScattered across tool logsCentralized decision evidence pipeline

(“Typical placement” is an architectural tendency, not a guarantee.)

Decision matrix (summary)

DimensionLLM-led executionOrchestrator-led execution
Where authority livesIn the model loopExternal controller / control plane
Containment mechanismsMore dependent on model behaviorCan be enforced deterministically in code
DebuggabilityReconstruct from model + tool tracesCentralized decision evidence possible
Observability/auditOften uneven unless standardizedConsistent event pipeline is straightforward
LatencyOften fewer hops in simple flowsExtra hop + validations can add latency
Cost governanceMust be added outside the model loop anywayNatural centralization point for budgets/quotas
Injection-to-action pathShorter if execution is prompt-drivenCan be lengthened by gates/allowlists

Tradeoffs (architecture-first)

1) Reliability & failure containment

LLM-led execution

Orchestrator-led execution

This is not a guarantee of correctness; it is control placement that makes deterministic containment feasible.

2) Debuggability & incident response

LLM-led execution

Orchestrator-led execution

3) Observability & auditability

LLM-led execution

Orchestrator-led execution

Concrete event schema (illustrative)

(Example only; not a standard.)

{
  "request_id": "…",
  "principal_id": "…",
  "tenant_id": "…",
  "proposed": { "intent": "…", "tool": "…", "args_hash": "…" },
  "validated": { "allow": true, "policy_id": "…", "reasons": ["…"] },
  "executed": { "tool": "…", "status": "ok", "duration_ms": 123 },
  "budgets": { "steps_used": 3, "tool_calls_used": 2 }
}

4) Latency & throughput

This is implementation-dependent. Orchestrator-led patterns add validation/authorization hops; LLM-led patterns can be more direct in simple flows. In complex flows, external validation can stop unsafe or wasteful cascades early (loops, retries, tool churn), which may reduce downstream latency/cost (bounded statement).

5) Cost governance (tokens, tools, and loops)

OWASP treats unbounded consumption as a distinct risk category (LLM10:2025) [5]. Regardless of pattern, budgets must be enforced deterministically somewhere. Orchestrator-led designs commonly centralize:

6) Product constraints (human-in-the-loop vs autonomy)

If product requirements include explicit confirmations (payments, deletes, privileged writes), orchestrator-led execution supports “no side effects without verified confirmation” as an enforceable rule (control-plane gate) independent of model compliance (bounded statement).

Security & governance (policy enforcement as one dimension)

Controls that should not live only in the prompt

OWASP’s GenAI Top 10 describes multiple classes of failures as application/system responsibilities, including:

The OWASP prompt-injection prevention guidance also emphasizes instruction/data separation and treating external content as untrusted input [2].

Why authority placement affects injection-to-action risk

OWASP describes indirect prompt injection as instruction-like content embedded in external sources that can alter behavior when ingested [1][2]. Research literature analyzes similar patterns where untrusted content becomes an instruction channel [10].

If the model can directly trigger tools, injection-style failures are more likely to reach the action layer unless deterministic gates exist in an external control plane (bounded statement).

Tool restriction and argument validation

If you provide tool-calling to the model, it is common to restrict which tools are callable in a given step/intent category (deny-by-default) and to validate tool-call arguments in application code. (Precise API field names differ by vendor/runtime. NOT VERIFIED in this chat without fetching vendor docs; see References [12]–[14].)

Where MCP fits (integration layer ≠ governance layer)

MCP standardizes how tools/resources connect to AI applications (protocol and transports) [6]. MCP also includes security guidance for implementations [7]. MCP authorization is defined at the transport level for HTTP-based transports [11].

Practical implication: MCP can simplify tool connectivity, but enforcement still needs a policy/orchestrator layer: allowlists, approvals, argument validation, budgets, and auditable decisions.

Minimal implementation checklist (tool execution)

Suggested next reads in this repo

References

[1] OWASP GenAI Security Project — LLM01:2025 Prompt Injection
https://genai.owasp.org/llmrisk/llm01-prompt-injection/

[2] OWASP Cheat Sheet Series — LLM Prompt Injection Prevention Cheat Sheet
https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html

[3] OWASP GenAI Security Project — LLM05:2025 Improper Output Handling
https://genai.owasp.org/llmrisk/llm052025-improper-output-handling/

[4] OWASP GenAI Security Project — LLM07:2025 System Prompt Leakage
https://genai.owasp.org/llmrisk/llm072025-system-prompt-leakage/

[5] OWASP GenAI Security Project — LLM10:2025 Unbounded Consumption
https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/

[6] Model Context Protocol — Specification (2025-06-18)
https://modelcontextprotocol.io/specification/2025-06-18

[7] Model Context Protocol — Security Best Practices (2025-06-18)
https://modelcontextprotocol.io/specification/2025-06-18/basic/security_best_practices

[8] NIST SP 800-207 — Zero Trust Architecture (DOI)
https://doi.org/10.6028/NIST.SP.800-207

[9] NIST SP 800-207 — PDF
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-207.pdf

[10] Greshake et al. (2023) — “Not what you’ve signed up for” (Indirect Prompt Injection), arXiv
https://arxiv.org/abs/2302.12173

[11] Model Context Protocol — Authorization (2025-06-18)
https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization

[12] OpenAI API — Function calling
https://developers.openai.com/api/docs/guides/function-calling/

[13] OpenAI API — Chat API reference
https://developers.openai.com/api/reference/resources/chat/

[14] OpenAI API — Structured outputs
https://developers.openai.com/api/docs/guides/structured-outputs/