Agent security
Engineering writeups on trust boundaries, access control, orchestration control-flow, and policy enforcement for tool-using LLM agents.
Engineering writeups on security properties of LLM-powered agentic applications (tool-using agents): trust boundaries, authorization and access control, orchestration (control-flow mechanisms), and monitoring/policy enforcement.
Core articles
Core pages
Agentic Systems 8 Trust-Boundary Audit Checkpoints
A practical audit checklist of 8 trust checkpoints where untrusted artifacts can steer routing, tool use, and write-path actions in chained LLM systems.
The attack surface starts before agents: the LLM integration trust boundary
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
Web-retrieved content is a prompt-injection boundary in tool-using LLM systems
Why retrieved web content must stay non-authoritative in browsing-enabled or tool-using LLM systems, and how to keep it from steering routing, tool arguments, or side effects.
Connected apps expand the capability and authorization surface of LLM systems
Why app-connected and MCP-enabled LLM systems should be analyzed as capability, scope, approval, and side-effect control problems—not only as prompt-processing systems.
The attack surface is the orchestration loop, not the model
How multi-step orchestration (controller) loops change the threat model in tool-using systems, and where to enforce separation, authorization, validation, and budgets to reduce prompt injection, tool misuse, unsafe writes, and unbounded consumption.
Control-Plane Failure Patterns in Tool-Using LLM Systems
Two vendor-agnostic control-plane failure patterns—privilege persistence across interaction boundaries and non-enforcing integrity signals—that allow untrusted state to steer tool execution across steps.
Prompt Assembly Policy Enforcement: Typed Provenance to Prevent Authority Confusion
Prevent authority confusion in prompt assembly by enforcing typed provenance separation between authoritative policy and untrusted content at ingress.
Request Assembly Threat Model (Author-Mapped): Reading the “ChatGPT Request Assembly Architecture” Diagram
A reviewer-oriented explanation of the request path (S1–S5), context sources, and R1–R8 checkpoints in an author-mapped request-assembly model.
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-only security report on text-only confirmations of privileged state/actions without verifiable signed audit artifacts; backend state changes not verified.
Social engineering in AI systems: attacking the decision pipeline (not just people)
Threat model of social engineering against AI decision pipelines; maps prompt injection to enforcement controls outside the model (PDP/PEP, validation, budgets).
Section resources
Context, reusable contracts, related links, and external baselines for this topic.
About this section About this section
Scope
- Focus: security properties of LLM-powered agentic applications (orchestration/workflows, routing/selection, policy enforcement, session boundaries & context isolation, tool invocation, write-path enforcement).
- Output style: engineering-oriented; emphasis on testable claims, explicit system boundaries, and mitigation guidance.
- Public-safe disclosure: some writeups omit PoC strings and raw evidence artifacts; request private evidence under coordinated disclosure when required.
Non-goals (out of scope for this section)
- General application security guidance that is not specific to agentic applications and orchestration/control-flow.
- Model-training security or claims about mechanism-level cognition.
Reusable contracts Reusable contracts
Mapped procedures and policies
-
Choose allowed sources for factual answers
Pick a facts-only boundary (allowed sources + refusal contract).
-
Web Verification & Citations Policy
When you cite web sources, enforce verification + citation rules.
-
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-observed artifacts vs claims requiring server-side confirmation (explicitly labeled).
-
Run the engineering quality gate — procedure
Use the engineering quality gate for structural/code correctness (not writing verification).