Agent security

Engineering writeups on security properties of LLM-powered agentic applications (tool-using agents): trust boundaries, authorization & access control, orchestration (control-flow mechanisms), and monitoring/policy enforcement.

Start here (choose a goal)

Pages in this section

Core pages

The attack surface is the orchestration loop, not the model
How multi-step orchestration (controller) loops change the threat model in tool-using systems, and where to enforce separation, authorization, validation, and budgets to reduce prompt injection, tool misuse, unsafe writes, and unbounded consumption.
The attack surface starts before agents: the LLM integration trust boundary
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
Prompt Assembly Policy Enforcement: Typed Provenance to Prevent Authority Confusion
Prevent authority confusion in prompt assembly by enforcing typed provenance separation between authoritative policy and untrusted content at ingress.
Request Assembly Threat Model (Author-Mapped): Reading the “ChatGPT Request Assembly Architecture” Diagram
A reviewer-oriented explanation of the request path (S1–S5), context sources, and R1–R8 checkpoints in an author-mapped request-assembly model.
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-only security report on text-only confirmations of privileged state/actions without verifiable signed audit artifacts; backend state changes not verified.
Social engineering in AI systems: attacking the decision pipeline (not just people)
Threat model of social engineering against AI decision pipelines; maps prompt injection to enforcement controls outside the model (PDP/PEP, validation, budgets).

Section resources

About this section About this section

Scope

  • Focus: security properties of LLM-powered agentic applications (orchestration/workflows, routing/selection, policy enforcement, session boundaries & context isolation, tool invocation, write-path enforcement).
  • Output style: engineering-oriented; emphasis on testable claims, explicit system boundaries, and mitigation guidance.
  • Public-safe disclosure: some writeups omit PoC strings and raw evidence artifacts; request private evidence under coordinated disclosure when required.

Non-goals (out of scope for this section)

  • General application security guidance that is not specific to agentic applications and orchestration/control-flow.
  • Model-training security or claims about mechanism-level cognition.
Reuse (contracts) Reuse (contracts)
Choose allowed sources for factual answers
Pick a facts-only boundary (allowed sources + refusal contract).
Web Verification & Citations Policy
When you cite web sources, enforce verification + citation rules.
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-observed artifacts vs claims requiring server-side confirmation (explicitly labeled).
Run the engineering quality gate — procedure
Use the engineering quality gate for structural/code correctness (not writing verification).
External baselines (shared terminology) External baselines (shared terminology)

Suggested next