Experiment

Baseline

April 4, 2026

definitionmlstatisticsmethodology

The starting measurement everything gets compared to. Without one, improvement is just a claim.

A baseline is the measurement taken before you change anything : the “before” in every before-and-after story. Without it, a score of 92% accuracy floats in space: impressive against what? The baseline converts abstract metrics into concrete progress: “we reduced error by 34% relative to where we started.” Three requirements for a useful baseline: measure before you change (the most common failure is retroactive measurement), use the same metric and method as your experiments, and document the conditions fully (model version, data window, evaluation procedure).

How It Works

Record current metric → document conditions precisely → run experiment → compare against baseline. The baseline is the initial value in a Karpathy ratchet : the floor that can only go up.

Example

Three projects anchored to baselines: Oil model MAE started at $1.12 (anchored every tuning round). LinkedIn engine started at 0% submission rate in Run 1 (ratcheted to 100% in Run 10). Brier score uses Polymarket’s published score as an external baseline. The $1.12 or 0% only matters because it was documented before anything changed.

How It Works

Example

Related