AI Agent Security Articles
Technical articles on LLM agent security, trust boundaries, prompt injection, authorization, tool-use controls, and orchestration risks.
Core pages
Gmail and WhatsApp Agents as Private-Message Execution Surfaces
A technical analysis of the security risks created when AI agents can read, interpret, route, or act on Gmail, WhatsApp, private message threads, attachments, links, and communication workflows.
Connected Apps and MCP Security in LLM Systems
Security analysis of connected apps, external tools, and remote MCP servers as capability, scope, approval, disclosure, and side-effect control surfaces.
Web Retrieval Prompt Injection Boundary in LLM Systems
A threat model for browsing-enabled and tool-using LLM systems where retrieved web content can steer routing, tool arguments, follow-up calls, or side effects.
LLM Boundary Assurance Failures: Client-Captured Security Report
Client-only security report on text-only confirmations of privileged state or actions without verifiable signed audit artifacts. Backend state changes are not verified.
AI Agent Orchestration Loop Attack Surface
How multi-step orchestration (controller) loops change the threat model in tool-using systems, and where to enforce separation, authorization, validation, and budgets to reduce prompt injection, tool misuse, unsafe writes, and unbounded consumption.
Prompt Assembly Policy Enforcement for LLM Systems
An engineering guide to preventing authority confusion in prompt assembly by separating authoritative policy from untrusted content with typed provenance.
Social Engineering in AI Systems and Decision Pipelines
Threat model of social engineering against AI decision pipelines; maps prompt injection to enforcement controls outside the model (PDP/PEP, validation, budgets).
LLM Integration Trust Boundary Before AI Agents
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
Request Assembly Threat Model for AI Agents
A reviewer-oriented threat model for request assembly in AI assistants: what enters context, what gets prioritized or dropped, and where policy, tool, memory, retrieval, and audit checkpoints should be reviewed.
Control-Plane Failure Patterns in Tool-Using LLM Systems
Two vendor-agnostic control-plane failure patterns—privilege persistence across interaction boundaries and non-enforcing integrity signals—that allow untrusted state to steer tool execution across steps.
Section resources
Context, reusable contracts, related links, and external baselines for this topic.
About this section About this section
Scope
- Focus: security properties of LLM-powered agentic applications (orchestration/workflows, routing/selection, policy enforcement, session boundaries & context isolation, tool invocation, write-path enforcement).
- Output style: engineering-oriented; emphasis on testable claims, explicit system boundaries, and mitigation guidance.
- Public-safe disclosure: some writeups omit PoC strings and raw evidence artifacts; request private evidence under coordinated disclosure when required.
Non-goals (out of scope for this section)
- General application security guidance that is not specific to agentic applications and orchestration/control-flow.
- Model-training security or claims about mechanism-level cognition.
Reusable contracts Reusable contracts
Mapped procedures and policies
-
Choose allowed sources for factual answers
Pick a facts-only boundary (allowed sources + refusal contract).
-
Web Verification & Citations Policy
When you cite web sources, enforce verification + citation rules.
-
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-observed artifacts vs claims requiring server-side confirmation (explicitly labeled).
-
Run the engineering quality gate — procedure
Use the engineering quality gate for structural/code correctness (not writing verification).