Experiment

DQI (Data Quality Index)

definitiondata-engmetricscomposite-score

Composite score measuring how complete and correct your data is. Seven components, one number.

DQI is a weighted composite of seven data-health dimensions : completeness, freshness, accuracy, consistency, temporal spread, geographic relevance, and description quality : combined into a single 0–1 score. Think of it like a restaurant health inspection: one inspector checks the kitchen, storage, handwashing, temperatures, and expiration dates. Each area scores independently, and together they tell you whether the operation is safe. Scores above 0.95 are production-ready; below 0.85 needs systematic remediation.

How It Works

Each dimension scores 0–1, weighted by business importance. Final DQI = weighted average. The weights are configurable so domain priorities can shift without code changes.

Example

A data pipeline producing event records might start with a DQI of 0.82: completeness is strong, but temporal spread is weak (events cluster in one month, not spread across the calendar). Running a Karpathy ratchet with DQI as the gate metric forces every change to improve the composite score. Over 20 iterations, targeted fixes to the most sensitive dimensions move the score to 0.97. The remaining gap is temporal spread : some domains are inherently lumpy and no configuration fully fixes that.