Articles
Long-form technical articles (engineering-oriented): threat models, agent/system architecture notes, evaluation methodology, and evidence/citation policy design. Use How-to for procedures and checklists; use Reference for stable lookup pages and canonical diagrams.
Latest articles
Browse allNewest published pages (auto-generated).
Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text
Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.
Theory of mind in LLMs — what benchmarks test (and what they don’t)
Evidence-anchored overview of how ToM is defined in psychology, how it is operationalized for LLM evaluation, and what current results do and do not justify.
Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production
A technically grounded explanation of sycophancy (belief-agreement bias): what it is, what the evidence supports about prevalence, how preference optimization can produce it, and what changes in training and release practice reduce it.
Prompt Engineering Guide for Daily Work (Deep Dive)
A deep dive into why prompts fail in daily work, how to design evidence-bounded prompt specifications (grounded outputs), and how to evaluate them.
Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation
A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.
LLM-Led vs Orchestrator-Led Tool Execution Control-Plane Placement Tradeoffs
A control-plane placement comparison across reliability, observability, latency, cost governance, and security for tool-using LLM systems.
Browse by topic
Each topic page includes: Start here (choose a goal) + all pages in the section + resources.
Agent security
Trust boundaries, authorization & access control, orchestration (control-flow mechanisms), policy enforcement, observability
Agent architecture
Workflows, state & lifecycle management, tool invocation patterns, retrieval & context management, evaluation harnesses
Model training and evaluation
Reliability, evaluation methods, benchmark interpretation limits
Prompt engineering
Operational prompting notes, evidence & citation requirements, reusable templates
Start here (one per topic)
1) The attack surface starts before agents: the LLM integration trust boundary
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
2) LLM-Led vs Orchestrator-Led Tool Execution Control-Plane Placement Tradeoffs
A control-plane placement comparison across reliability, observability, latency, cost governance, and security for tool-using LLM systems.
3) Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong
Why fluent LLM outputs can still be wrong, and how to enforce evidence-locked answers (retrieval + provenance + fail-closed gates).
4) Prompt Engineering Guide for Daily Work (Deep Dive)
A deep dive into why prompts fail in daily work, how to design evidence-bounded prompt specifications (grounded outputs), and how to evaluate them.
All articles
Grouped by topic; within each topic sorted by published date (newest first).
Agent security (8) Agent security (8)
Social engineering in AI systems: attacking the decision pipeline (not just people)
Threat model of social engineering against AI decision pipelines; maps prompt injection to enforcement controls outside the model (PDP/PEP, validation, budgets).
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-only security report on text-only confirmations of privileged state/actions without verifiable signed audit artifacts; backend state changes not verified.
Control-Plane Failure Patterns in Tool-Using LLM Systems
Two vendor-agnostic control-plane failure patterns—privilege persistence across interaction boundaries and non-enforcing integrity signals—that allow untrusted state to steer tool execution across steps.
The attack surface is the orchestration loop, not the model
How multi-step orchestration (controller) loops change the threat model in tool-using systems, and where to enforce separation, authorization, validation, and budgets to reduce prompt injection, tool misuse, unsafe writes, and unbounded consumption.
The attack surface starts before agents: the LLM integration trust boundary
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
Agentic Systems 8 Trust-Boundary Audit Checkpoints
A practical audit checklist of 8 trust checkpoints where untrusted artifacts can steer routing, tool use, and write-path actions in chained LLM systems.
Request Assembly Threat Model (Author-Mapped): Reading the “ChatGPT Request Assembly Architecture” Diagram
A reviewer-oriented explanation of the request path (S1–S5), context sources, and R1–R8 checkpoints in an author-mapped request-assembly model.
Prompt Assembly Policy Enforcement: Typed Provenance to Prevent Authority Confusion
Prevent authority confusion in prompt assembly by enforcing typed provenance separation between authoritative policy and untrusted content at ingress.
Agent architecture (3) Agent architecture (3)
LLM-Led vs Orchestrator-Led Tool Execution Control-Plane Placement Tradeoffs
A control-plane placement comparison across reliability, observability, latency, cost governance, and security for tool-using LLM systems.
LLM Memory Boundary Model: Context Construction (Eligibility, Selection, Persistence) and Why Answers Change
A vendor-agnostic model of context construction—what can enter context (eligibility), what gets used per response (selection), and what is retained for later (persistence)—and the security controls that must live outside the prompt.
Human vs GenAI capability map (engineering view)
A practical mapping of human cognitive capabilities to GenAI limitations, engineering substitutes, and residual gaps.
Model training and evaluation (5) Model training and evaluation (5)
Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text
Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.
Theory of mind in LLMs — what benchmarks test (and what they don’t)
Evidence-anchored overview of how ToM is defined in psychology, how it is operationalized for LLM evaluation, and what current results do and do not justify.
Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production
A technically grounded explanation of sycophancy (belief-agreement bias): what it is, what the evidence supports about prevalence, how preference optimization can produce it, and what changes in training and release practice reduce it.
Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation
A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.
Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong
Why fluent LLM outputs can still be wrong, and how to enforce evidence-locked answers (retrieval + provenance + fail-closed gates).