Monte Carlo pricing engine v17.2: R²=0.975, MAE=$0.83, Brier +11pp vs Polymarket

From failing all 5 historical backtests to beating Polymarket by 11 percentage points. That’s the v12-to-v17 arc: 5 experiments, cumulative MAE reduction of 49%, OOS R² +6.24pp, and a pricing model that now runs autonomously every hour.
Context
The v12 oil pricing model had a respectable in-sample fit but failed every external calibration test. 0 of 5 historical analogue scenarios passed. The tail risk probability for $120+ oil was 15.4% : too conservative given documented historical shocks. The model also lacked any sell-side discipline: position exits were manual, gut-feel decisions. And every parameter update required 2-4 hours of manual work.
The v13-v17 experiment chain set out to fix all four problems: tail calibration, forecast accuracy, sell-side discipline, and update automation. The skills/monte-carlo-pricing-engine skill documents the final architecture.
What Changed
Five experiments in sequence, each targeting a specific weakness:
v13 added 3 historical analogues (Arab 1973, Iran 1979, Iraq 2003) and raised jump size from $4 to $5/bbl. OOS R² jumped +4.98pp. v14 introduced autoregressive error correction (arLambda=0.4) and spike reversion mechanics, eliminating the persistent Day 16 bias and reducing MAE 33%. v14b replaced binary blending with sigmoid probability blending and expanded the validation dataset to 10,114 daily prices (1986-2026), producing the single largest MAE reduction: 42%. v16 introduced the 11-trigger sell model (100% accuracy, 0 false sells, CVaR99 -24.6%) and validated against Polymarket using Brier scoring (+11.21pp edge). v17 automated the entire parameter update cycle into an 8-layer geopolitical consensus pipeline running hourly with forward/backward sandbox gates.
Impact
Cumulative across the v13-v17 chain:
- MAE: $1.633 to $0.833 (-49%)
- OOS R²: 0.8633 to 0.975 (+6.24pp net)
- Direction accuracy: +7.1pp
- Brier edge vs Polymarket: +11.21pp (82% win probability)
- Update cadence: manual 1-2x daily to autonomous hourly
Before: v12 failing all backtests, no sell discipline, manual updates. After: v17.2 beating the prediction market, 11-trigger sell model, fully autonomous hourly calibration.
Source
- Skill: skills/monte-carlo-pricing-engine
- Experiments: experiments/oil/2026-03-16-v13-tail-risk-recalibration, experiments/oil/2026-03-17-v14-ar-error-correction, experiments/oil/2026-03-17-v14b-sigmoid-calibration, experiments/oil/2026-03-18-v16-sell-model-deesc-tuning, experiments/oil/2026-03-18-v17-realtime-consensus-pipeline