Replacing uniform Brier reference ($90-$220) with market-implied midpoint PDF, adding squared-deviation calibration penalty (λ=0.5) against both Polymarket and Kalshi, enforcing April<=Yearly monotonicity, and computing at $1 granularity (141 thresholds matching Robinhood contracts) will produce model probabilities within 2-5pp of market consensus instead of the 50pp divergence caused by the gamed v25.1 ratchet
260414b final. Three-layer fix: (1) Calibration-proper scoring eliminated Goodhart gaming (calPen 0.216->0.060). (2) Proportional anchor scales forwar
HypothesisReplacing uniform Brier reference ($90-$220) with market-implied midpoint PDF, adding squared-deviation calibration penalty (λ=0.5) against both Polymarket and Kalshi, enforcing April<=Yearly monotonicity, and computing at $1 granularity (141 thresholds matching Robinhood contracts) will produce model probabilities within 2-5pp of market consensus instead of the 50pp divergence caused by the gamed v25.1 ratchet
260414b final. Three-layer fix: (1) Calibration-proper scoring eliminated Goodhart gaming (calPen 0.216->0.060). (2) Proportional anchor scales forward premiums by actual/model ratio. (3) Known ceasefire injection: all MC paths start from Day 39 ceasefire + 5%/day recovery. Result: Day 46 median $122->$97. Forward forecast now consistent with $92 spot. Public lab deployed with live R2 data, event markers on all charts, $1 threshold granularity.

Changelog
| Date | Summary |
|---|---|
| 2026-04-14 | Created. Full pipeline: data refresh Day 45, dual-platform ratchet (gamed), caught Goodhart pitfall, calibration-proper fix, structural recalibration (proportional anchor + known ceasefire injection + 5%/day recovery). Public Lab deployed: oil dashboard live at /projects/oil/ with R2 data pipeline, event markers on all charts. |
Hypothesis
The v25.1 ratchet gamed the Brier-under-uniform metric by making the model maximally bullish (76% for $120 when PM says 16.5%). Under uniform reference, every hypothetical peak from $90-$220 gets equal weight, so being bullish wins because most of the reference range is above $120. An ICE-style quant approach uses market-implied reference and calibration constraints to produce small, defensible edges backed by structural information, not metric gaming.
Method
Four fixes applied to ratchet.mjs and quant-engine.js:
-
Market-implied reference: Brier scored under midpoint blend of model + market implied PDFs. This makes Brier ~0 by construction (both forecasters equally accurate under the fair reference).
-
Calibration penalty:
objective = -Σ(model_prob - market_prob)² / n_thresholds * 100. Squared deviation penalizes large disagreements quadratically. λ=0.5 makes a 10pp deviation cost 0.5pp per threshold, a 50pp deviation costs 12.5pp. -
Monotonicity gate: P(April peak >= $X) <= P(Yearly peak >= $X) for all unresolved thresholds. Resolved thresholds (below actual peak $115.81) exempted.
-
$1 granularity: MC engine computes at 141 thresholds ($80-$220 at $1), up from 17. Matches Robinhood/Kalshi daily contracts which trade at $1 intervals.
Results
| Metric | Before (gamed v25.1) | After (260414 calibrated) |
|---|---|---|
| tailProb | 0.50 | 0.20 |
| CalPen | 0.216 | 0.060 |
| $120 April | 76.2% | 44.9% (PM: 16.5%) |
| $130 April | 58.9% | 25.5% (PM: 9.5%) |
| $150 April | 48.9% | 17.8% (PM: 4.0%) |
| $200 April | 27.4% | 5.6% (PM: 1.1%) |
27 accepted parameter changes across 30 iterations. Primary driver: tailProb reduction from 0.50 to 0.17, then fine-tuned back to 0.20 after other params settled.
Findings
- Uniform Brier reference is fundamentally broken for ratchet optimization. It rewards being maximally bullish because the reference overweights high-peak scenarios. This is a Goodhart’s Law violation documented as a pitfall.
- Market-implied midpoint Brier is symmetric by construction: both model and market score equally (~0 delta). The objective becomes purely calibration penalty, which is actually correct: the ratchet should minimize deviation from market, not try to beat the market on Brier.
- Calibration penalty alone produces reasonable probabilities. The model naturally converges to 5-28pp above market (structural edge from Hormuz traffic, blockade data, mine clearance timelines) without any explicit “edge targeting.”
- $1 threshold granularity is essential. Robinhood’s Kalshi daily contracts are at $1 intervals. Computing at $10 intervals meant the model couldn’t price 93% of tradeable contracts.