Journal

14 iterations until the oil model beats Polymarket on WTI accuracy

March 15, 2026

quant-financecareerdata-engreconstructed-from-sessions

2026-03-16

Signal

Running 14 iterative sessions against a single model until it beats Polymarket accuracy on WTI CL demonstrates that systematic iteration with a fixed evaluation criterion converges faster than unsystematic improvement: the criterion is the accelerant, not the number of sessions.

Evidence

Project: projects/oil/_index: 14 interactive sessions; model progression v13 → v14 → v15
Milestone: Beat Polymarket accuracy target on WTI CL
Architecture improvements: Geopolitical Recommendation Accuracy: Layered Defense System; cron jobs refreshed for live dashboard
Weaknesses found in v13: Identified via full quant audit with 20-year experience persona; addressed in v14/v15 progression
Project: projects/jobs-apply/_index: Expanded job title coverage, added new channels, workers, and sub-agents
Project: internal audit: Pipeline-Integrated Feed Quality System: 1,102 null-date events discovered; quality gates added to pipeline stages
Volume: 154 automated code-review sessions (oil: 112, autosearch: 32)

So What (Why Should You Care)

The projects/oil/_index model iteration pattern is a template for any complex model optimization. Fourteen interactive sessions in one day sounds like thrashing. It isn’t: because every session was evaluated against the same fixed criterion: beat Polymarket accuracy on WTI CL. The criterion didn’t change between sessions. The model changed; the standard didn’t.

Without a fixed criterion, iteration becomes guesswork. “This version seems better” is not a useful evaluation: it lacks a comparison baseline and a consistent measurement. With a fixed criterion, each session either moves the needle on the metric or doesn’t, and you accumulate real progress rather than circular exploration. The criterion also determines when to stop: you stop when you’ve beaten it, not when you feel like you’ve done enough work.

The Layered Defense System for geopolitical recommendations is the architectural answer to a hard problem in quantitative modeling: geopolitical events are inherently uncertain, and any single model component that tries to represent that uncertainty will have brittle edges. The layered defense approach stacks multiple independent signals: each one imperfect: so that the combined recommendation is more robust than any individual component. It’s the same principle behind ensemble models in machine learning.

The 1,102 null-date events discovered in internal audit today show the same “define the metric first” principle applied to data quality. You can’t measure how many null-date events exist until you define what a null-date event is. The Pipeline-Integrated Feed Quality System defined it, instrumented for it, and found 1,102 of them in production data: a number that was invisible before the check existed.

Both the oil model and the internal audit pipeline pipeline ran the same playbook today: define the evaluation criterion, build the measurement infrastructure, then iterate. The order matters. Starting with iteration before you have measurement is activity without progress.

The v13 → v14 → v15 model progression across 5+ iteration cycles within a single day also demonstrates that rapid iteration is possible when the evaluation infrastructure is fast enough. If each model evaluation took hours to run, 5+ cycles in a day would be impossible. Because the Polymarket accuracy comparison can be computed quickly, the feedback loop is tight enough to support multiple cycles per session.

The Layered Defense System for geopolitical recommendations is the architectural solution to a problem that single-model approaches can’t solve: how do you make confident recommendations when the input data (geopolitical events) is inherently uncertain? Each layer in the defense system represents a different signal source or analytical approach. The combined output is more robust than any individual layer because independent signals that agree on a recommendation provide stronger evidence than any single signal alone.

154 automated code-review sessions split predominantly between oil (112) and autosearch (32) reflects where the active development was. The 112 oil sessions suggest significant code changes in support of the model iterations: parameter extraction, scenario testing, dashboard updates. The 32 autosearch sessions reflect maintenance activity rather than active development, consistent with the project being in wind-down mode.

What’s Next

Oil model: validate v15 against holdout data before promoting to production
internal audit pipeline: quality gates deployed: monitor null-date detection rates going forward

Log

projects/oil/_index: 14 interactive sessions: massive model iteration day
v13 structural weaknesses identified via full quant audit (20-year experience persona)
Full quant audit with 20-year experience persona for model assessment
v14 → v15 progression across 5+ iteration cycles
Beat Polymarket accuracy target on WTI CL: key milestone
Geopolitical Recommendation Accuracy: Layered Defense System implemented
Cron jobs refreshed for live dashboard
projects/jobs-apply/_index: expanded coverage for additional job titles; added new channels, workers, and sub-agents
internal audit: Pipeline-Integrated Feed Quality System implemented
1,102 null-date events discovered systemically
Quality gates added to pipeline stages
154 automated code-review sessions (oil: 112, autosearch: 32)