LLM Evaluation and Model Behavior Articles

Technical articles on LLM reliability, factuality, sycophancy, benchmark interpretation, emotional-signal handling, and model-behavior limits.

Core pages

How AI Tools Read Emotional Signals in Text

A mechanism-first explanation of textual emotional signals in AI chat and agentic systems: signal interpretation, response adaptation, failure modes, and the authority boundary.

Published

When Human-Like Signals Fail-Cue Misalignment in Clowns and AI-Generated Outputs

Why clowns and some AI-generated outputs can feel unsettling: not because they are simply strange, but because they imitate human cues while disrupting the signals people rely on to read emotion, intent, realism, and coherence.

Published

Observed Classification Layers in ChatGPT

A client-side black-box analysis of observed ChatGPT classification artifacts, separating user access, prompt demand, and capability allocation.

Published

Theory of mind in LLMs — what benchmarks test (and what they don’t)

Evidence-anchored overview of how ToM is defined in psychology, how it is operationalized for LLM evaluation, and what current results do and do not justify.

Published

Sycophancy in LLM Assistants

A technically grounded explanation of sycophancy: what it is, what evidence supports, how preference optimization can produce it, and how release practice can reduce it.

Published

Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation

A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.

Published

Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong

Why fluent LLM outputs can still be wrong, and how to enforce evidence-locked answers (retrieval + provenance + fail-closed gates).

Published