Google's TimesFM foundation model, applied to the sequence of DQI/Brier scores from ratchet iterations, will detect convergence plateaus 3-5 iterations earlier than the current human heuristic of 'no improvement for N consecutive runs'
HypothesisGoogle's TimesFM foundation model, applied to the sequence of DQI/Brier scores from ratchet iterations, will detect convergence plateaus 3-5 iterations earlier than the current human heuristic of 'no improvement for N consecutive runs'

Changelog
| Date | Summary |
|---|---|
| 2026-04-06 | Audited: added Changelog, domain tag quant-finance, stamped last_audited |
| 2026-04-04 | Initial creation |
Hypothesis
The TimesFM research note documents a 200M-parameter time series foundation model pre-trained on 100B data points with zero-shot capability. The rubric overfitting pitfall shows that optimization loops can waste iterations pursuing diminishing returns once the metric has effectively converged. The hypothesis is that TimesFM can forecast the convergence trajectory and identify plateaus earlier than the current manual heuristic.
The oil model’s Brier score optimization (v13-v17, 5 iterations) and the LinkedIn submission rate ratchet (10 iterations) provide two real metric sequences to test against.
Method
- Data preparation: extract metric sequences from both projects:
- oil: 5 Brier score values from v13-v17 experiments
- jobs-apply: 10 LinkedIn submission rate values from runs 1-10
- TimesFM inference: install
timesfmpackage (Apache license, pip installable). Feed each sequence with context length matching the full history. Forecast next 10 iterations. - Plateau detection: define plateau as “forecasted improvement < 0.1% for next 5 iterations.” Compare TimesFM’s plateau prediction against the actual iteration where human stopped optimizing.
- Quantile analysis: use TimesFM’s quantile head to get confidence intervals on the forecast. Wide confidence intervals at the plateau suggest the model is uncertain, which itself is a useful signal.
- Backtest: for the LinkedIn submission sequence, run TimesFM at iteration 5 (midpoint) and check if it correctly predicts the convergence to 100% observed at iteration 10.
- Compute: TimesFM v2.5 is 200M parameters, runs on CPU in seconds. No GPU required.
Results
Pending. Will measure:
- Plateau detection lead time (iterations saved)
- Forecast accuracy (MAPE on next-5 predictions)
- Confidence interval calibration
- CPU inference time per forecast
Findings
Pending.
Next Steps
If confirmed, integrate TimesFM plateau detection into the skills/karpathy-ratchet skill as an automatic stopping criterion. This would prevent the ratchet from wasting iterations once the metric has effectively converged, and also flag when the metric is still improving and the ratchet should continue.