Model training and evaluation

Notes on evaluation and reliability for LLM outputs (single-turn and multi-step): calibration, evidence/grounding, and how to interpret benchmarks.

Start here

Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong

A reliability baseline: why fluent text is not evidence

Next: Run the fact-checking kit — procedure

Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production

Agreement bias under user belief priming (sycophancy)

Next: Facts-only: Authoritative sources required (citations required)

Theory of mind in LLMs — what benchmarks test (and what they don’t)

Benchmark interpretation limits (what tests mean / don’t mean)

Next: Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation

Choose allowed sources for factual answers

An operational evidence boundary for publishable claims

Next: Facts-only: Authoritative sources required (citations required)

Pages in this section

Core pages

Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation

A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.

Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text

Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.