Model training and evaluation

Notes on evaluation and reliability for LLM outputs (single-turn and multi-step): calibration, evidence/grounding, and how to interpret benchmarks.

Start here

Pages in this section

Core pages