Breakthrough Preferences oil

Monte Carlo pricing engine v17.2: R²=0.975, MAE=$0.83, Brier +11pp vs Polymarket

v12: OOS R² 0.8633, MAE $1.633, 0/5 historical backtests passing -> v17.2: OOS R² 0.975, MAE $0.833, Brier +11.21pp vs Polymarket, 82% win probability

March 30, 2026

breakthroughoilquant-finance

Key Metric

Before

v12: OOS R² 0.8633, MAE $1.633, 0/5 historical backtests ...

→

↓

After

v17.2: OOS R² 0.975, MAE $0.833, Brier +11.21pp vs Polyma...

From failing all 5 historical backtests to beating Polymarket by 11 percentage points. That’s the v12-to-v17 arc: 5 experiments, cumulative MAE reduction of 49%, OOS R² +6.24pp, and a pricing model that now runs autonomously every hour.

Context

The v12 oil pricing model had a respectable in-sample fit but failed every external calibration test. 0 of 5 historical analogue scenarios passed. The tail risk probability for $120+ oil was 15.4%: too conservative given documented historical shocks. The model also lacked any sell-side discipline: position exits were manual, gut-feel decisions. And every parameter update required 2-4 hours of manual work.

The v13-v17 experiment chain set out to fix all four problems: tail calibration, forecast accuracy, sell-side discipline, and update automation. The skills/monte-carlo-pricing-engine skill documents the final architecture.

What Changed

Five experiments in sequence, each targeting a specific weakness:

v13 added 3 historical analogues (Arab 1973, Iran 1979, Iraq 2003) and raised jump size from $4 to $5/bbl. OOS R² jumped +4.98pp. v14 introduced autoregressive error correction (arLambda=0.4) and spike reversion mechanics, eliminating the persistent Day 16 bias and reducing MAE 33%. v14b replaced binary blending with sigmoid probability blending and expanded the validation dataset to 10,114 daily prices (1986-2026), producing the single largest MAE reduction: 42%. v16 introduced the 11-trigger sell model (100% accuracy, 0 false sells, CVaR99 -24.6%) and validated against Polymarket using Brier scoring (+11.21pp edge). v17 automated the entire parameter update cycle into an 8-layer geopolitical consensus pipeline running hourly with forward/backward sandbox gates.

Impact

Cumulative across the v13-v17 chain:

MAE: $1.633 to $0.833 (-49%)
OOS R²: 0.8633 to 0.975 (+6.24pp net)
Direction accuracy: +7.1pp
Brier edge vs Polymarket: +11.21pp (82% win probability)
Update cadence: manual 1-2x daily to autonomous hourly

Before: v12 failing all backtests, no sell discipline, manual updates. After: v17.2 beating the prediction market, 11-trigger sell model, fully autonomous hourly calibration.

Context

What Changed

Impact

Source