A research pipeline will naturally demand model escalation as easy targets are exhausted, with the transition visible in session-length growth and dominant-model shift from Sonnet to Opus over a multi-day run.
Jan 28: 569 min, $3.09, cheap-model territory (long background fetches, minimal reasoning). Jan 29-30: session count peaked at 104/day as Sonnet swept
HypothesisA research pipeline will naturally demand model escalation as easy targets are exhausted, with the transition visible in session-length growth and dominant-model shift from Sonnet to Opus over a multi-day run.
Jan 28: 569 min, $3.09, cheap-model territory (long background fetches, minimal reasoning). Jan 29-30: session count peaked at 104/day as Sonnet swept breadth targets. Jan 31: session count halved to 48, average session length grew 77% (4.27 min vs 2.4 min), Opus 4.5 dominant, cost jumped to $48.02. The pipeline naturally shifted from throughput to reasoning as easy targets were exhausted. The model-swap was not pre-planned but emerged from the target queue: complex targets that Sonnet queued for deep reasoning were processed by Opus on Jan 31.
Changelog
| Date | Summary |
|---|---|
| 2026-04-28 | Audit pass: frontmatter + structure normalization |
| 2026-01-28 | Initial creation |
Hypothesis
Research pipelines face a natural tension between breadth (covering many targets cheaply) and depth (reasoning deeply about complex targets). The hypothesis was that a multi-day research pipeline would naturally demand model escalation as easy targets were exhausted: visible in growing session lengths and a shift in dominant model from Sonnet to Opus: rather than staying productive on a single model tier throughout the entire run.
Method
The investor-research pipeline ran from January 28 through January 31. No explicit model-selection logic was built in; the pipeline used whatever model was configured, and the operator (me) could swap models between runs. The experiment tracked:
| Metric | How Measured |
|---|---|
| Session count | bloomnet.db session telemetry |
| Session duration | Total minutes per day |
| Cost | API spend per day |
| Dominant model | Model with majority of tokens per day |
| Session length trend | Average minutes per session |
The key observation was whether the pipeline would naturally demand different model capabilities as it progressed through its target queue, and whether the cost profile would reflect that demand.
Results
| Date | Sessions | Minutes | Cost | Avg min/session | Dominant model | Character |
|---|---|---|---|---|---|---|
| Jan 28 | 2 | 569 | $3.09 | 284.5 | Cheap (Sonnet) | Inaugural run, mostly fetch-wait |
| Jan 29 | ~100+ | ~250 | ~$25 | ~2.4 | Sonnet | Peak throughput sweep |
| Jan 30 | 104 | 250 | ~$30 | 2.4 | Sonnet | Continued breadth coverage |
| Jan 31 | 48 | 205 | $48.02 | 4.27 | Opus 4.5 | Deep reasoning, 77% longer sessions |
The model-swap was not pre-planned but emerged from the target queue. Sonnet exhausted the easy targets (those requiring primarily information retrieval and light summarization) by Jan 30. The remaining targets required deeper reasoning: cross-referencing multiple sources, evaluating contradictory signals, producing structured assessments. These were naturally routed to Opus on Jan 31.
The cost jump from $3.09 to $48.02 (15x) while session count dropped from 569-minute background runs to 48 sessions of 4.27 minutes each reflects the fundamental cost difference between “pipeline waiting on fetches” and “model doing hard reasoning.”
Findings
-
Model-swap emerges naturally from target complexity. No explicit routing logic was needed. The operator recognized that Sonnet was producing shallow results on complex targets and switched to Opus. A production system could automate this with a complexity classifier, but manual switching worked for the initial run.
-
Cost is not the right optimization target. The $48.02 Jan 31 spend produced research outputs that the $3.09 Jan 28 run could not. Optimizing for cost would have kept the pipeline on Sonnet and produced breadth without depth. The right metric is actionable-output-per-dollar, not dollar-per-session.
-
Session length is the leading indicator of model demand. When average session length grows (2.4 min to 4.27 min, a 77% increase), the pipeline is hitting targets that require more context per call. This is the signal to consider model escalation.
-
Long background runs are cheap. The 569-minute inaugural run at $3.09 ($0.005/min) confirms that research pipelines dominated by fetch-and-wait are nearly free in API terms. The cost lives in the reasoning, not the orchestration.
Next Steps
The model-swap pattern observed here is a candidate for automation: a complexity classifier on the target queue could route easy targets to Sonnet and complex targets to Opus without manual intervention. Session-length growth (the 77% increase) is the trigger signal.
Source
- Journal: journal/daily/2026-01-28: pipeline launch
- Journal: journal/daily/2026-01-31: model-swap to Opus, 77% longer sessions