Model training and evaluation
Notes on evaluation and reliability for LLM outputs (single-turn and multi-step): calibration, evidence/grounding, and how to interpret benchmarks.
Start here
Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong
A reliability baseline: why fluent text is not evidence
Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production
Agreement bias under user belief priming (sycophancy)
Theory of mind in LLMs — what benchmarks test (and what they don’t)
Benchmark interpretation limits (what tests mean / don’t mean)
Choose allowed sources for factual answers
An operational evidence boundary for publishable claims
Pages in this section
Core pages
Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation
A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.
Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text
Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.