Oil model v18.1 beats Polymarket by +2.16pp Brier score
Stale 10-day-old model, degenerate 0%/100% probabilities from MC time-window bug -> R²=0.9134, MAPE=2.26%, beats Polymarket by +2.16pp uniform Brier, 55% win probability across all thresholds

Context
The oil model had been sitting at v17 for ten days, long enough that the forward-month Monte Carlo was quietly broken in a way the fit metrics hid. Polymarket, the public prediction market on crude-price thresholds, was absorbing my mis-priced signal for free. Worse, the model was outputting degenerate probabilities (0% or 100%) on a non-trivial share of days, which means it had stopped expressing uncertainty at all. A model that never says “maybe” is a model that cannot be wrong in a useful way. Beating the crowd on a prediction market requires both calibration and sharpness. Polymarket had better calibration than v17.
What Changed
I ran a 28-step Karpathy ratchet (a pattern where every candidate change must either improve the target metric or get rejected, logged in a visible ledger) after finding the critical time-window bug inside the Monte Carlo engine. The MC was tracking a mar31Day peak variable but comparing it against April Polymarket markets. Day greater than mar31Day made every simulation lock to the actual realized peak, producing the degenerate 0% and 100% probabilities. Once a MC has seen the future, its variance collapses.
Fixing the time window was step one. Step two externalized five previously-hardcoded forward MC parameters into parameters.json so the ratchet could perturb them systematically. Step three ran the automated ratchet to completion: 6 accepted parameter changes, 22+ rejected. The model now outperforms Polymarket on a uniform Brier score for the first time across the full threshold grid. The repository was initialized and pushed to github.com/Alex-Zeo/cl-futures-hormuz.
Impact
- R²: 0.9046 to 0.9134, a 0.9-point improvement on an already high baseline
- MAPE: 2.39% to 2.26%
- Autocorrelation: 0.512 to 0.380, fixing a broken AC structure that had been hiding in the residuals
- Late MAE: $3.58 to $3.22
- Uniform Brier beats Polymarket by 2.16 percentage points
- 55% win probability across all thresholds, up from roughly coin-flip
The v18.1 release is the first version where the model is both a research artifact and a competitive forecaster. Every later improvement (v19, v20, the auto-refresh launchd plist, the R²=0.904 MAPE=2.42% production build) rides on the time-window fix and the externalized parameters introduced here. Without the MC fix, every downstream optimization was noise on top of a corrupt signal.