Journal

15,630 Output Tokens From 4 Input Tokens: Cached Context at Work

reviewsparsebrandhousetelemetry-gap

11 tracked sessions, $3.91, three active days. The other four days produced no session telemetry despite measurable git activity and at least two landed commits. The single most efficient session of the week generated 15,630 output tokens from 4 input tokens at 100% cache hit, reading 283,174 tokens from the warm prompt cache. The gap between what the session record shows and what the commit graph shows is the week’s most important observation.

The 100% Cache-Hit Session: 15,630 Output Tokens From 4 Input Tokens

One session on Feb 18 produced 15,630 output tokens against 4 input tokens, at 100% cache hit, reading 283,174 tokens from the prompt cache. The cost: $0.45.

The ratio is not a rounding artifact. The session read everything it needed from the warm cache and generated from almost nothing. Net new input was negligible. The model already knew the full context; the session provided the task delta.

This is the design working correctly. Prompt caching is opt-in per-request, requiring cache_control headers and the appropriate beta flag. Stable prompt structures, pre-warmed, allow each session to act as a thin layer of steering over a pre-loaded foundation rather than re-transmitting the full context on every call.

The corresponding commit, skl-engine #89, added 822 lines and removed 89 across 14 files: a cache and logger fix. Cache and logger fixes in data pipelines compound downstream. A corrected logger surfaces previously silent failures. A fixed cache avoids redundant fetches that were silently degrading throughput. The session’s token profile tells the story: the system prompt was large, stable, and warm; the work was surgical.

The week’s overall cache efficiency confirms the pattern holds across all three active projects: 99.35% average hit rate, with Feb 18 reaching exactly 100%. Zeroclaw ran a warm-context ratio of approximately 293

(2,203,535 cache-read tokens against 7,516 net input tokens). That ratio is consistent with a project that runs against a large, stable system prompt where each session is a thin steering layer over a pre-warmed foundation. The marginal cost per session approaches zero when the cache is warm and the task is narrow.

The Feb 18 session also produced the week’s only green MQI day: 0.5481, composite Z of +0.1208. The two preceding days ran haiku-dominant at warning levels (0.2942, 0.2858). The model-mix shift on Feb 18 to sonnet-dominant correlates with the MQI crossing into green. One session, one commit, one day.

Transferable insight: The most efficient AI session is one where the model already knows everything and the input provides only the delta. Prompt caching infrastructure is the prerequisite; stable prompt design is the discipline.

When Commits Don’t Need Sessions: The Batch-Merge Blind Spot

Mixin: what AI telemetry undercounts by design

Four consecutive git-only days (Feb 19-22) with no session telemetry. The daily journals escalate the concern over those four days: from “I should check whether the ingester broke” to “the backfill question is now urgent” to the Feb 22 close: “large git-log days without session telemetry are probably under-credited in any retrospective. If I ever tune the journaling pipeline based purely on session counts, days like this get treated as holidays, which is wrong.”

The structural issue is that bloomnet.db tracks interactive sessions. Batch merges, automated runs, and non-interactive git activity don’t produce session records. When four consecutive days show zero sessions against measurable commits, the telemetry system reports silence where there was activity. Any weekly rollup built on session counts for this period understates throughput.

The fix described in the Feb 22 daily: instrument commit activity as a fallback signal so stream divergence surfaces as an alert rather than a gap. Two streams (sessions, commits) should agree; when they diverge by more than a threshold, the divergence is the signal.

The brandhouse_ppt pattern illustrates the positive case. Feb 17’s single largest commit: 6,899 additions, 149 deletions, 26 files changed. Session count for brandhouse that day: 4, averaging 0.6 minutes each. The sessions were review-and-commit flows, not generation runs. The commit volume far outweighs what the session record implies. Haiku handled 98% of the day’s token budget with no quality complaints in the committed output.

Transferable insight: AI development telemetry designed for interactive sessions systematically undercounts batch and merge work. Commits are a parallel signal stream; treating them as supplementary rather than primary leads to wrong throughput estimates.

Informal Hypotheses vs Tracked Experiments

Mixin: where optimization opportunities die

The Feb 17 daily surfaces a question: can Haiku handle the same class of mechanical refactor that previously required Opus oversight? Haiku ran at 98% of the token budget across 4 brandhouse sessions. The output passed review. The natural follow-up is a deliberate A/B split: same task class, half on Haiku, half on Sonnet, compare review feedback.

That experiment was not run.

The gap between “we noticed something interesting” and “we tracked it systematically” is where most optimization opportunities die. The haiku-vs-opus cost-floor question would have required one additional brandhouse task, instrumented as two parallel sessions with a review pass at the end. The observation is clear enough to point at a hypothesis; the hypothesis was not hardened into a test.

The consequence is asymmetric. If Haiku can match Sonnet on this task class, the cost floor for mechanical refactors drops materially. If it cannot, the boundary is known and the decision to use Opus is principled rather than habitual. Either outcome has value. The untracked hypothesis produces neither.

Transferable insight: The gap between noticing a pattern and instrumenting it as an experiment is where most AI cost optimization opportunities are lost. Observations that are clear enough to form a hypothesis are worth one more session to test.

Zeitgeist

@heygurisingh
Google CodeWiki turns any GitHub repo into an interactive guide: paste a URL, get architecture diagrams, walkthroughs, and a code-aware chatbot.
@HipCityReg
“Situation Monitor”: one dashboard for global activity, markets, predictions, and AI news: open-source geospatial intelligence with real-time threat mapping.
@almmaasoglu
WebGPU in-browser AI: no server, no signup, runs fully locally: silky-smooth performance using the GPU already in your machine.

By the Numbers

MetricValue
Sessions11
Total cost$3.91
Active days with telemetry3
Git-only days4 (Thu-Sun)
Avg cache hit rate99.35%
Peak MQI0.5481 (Feb 18, green)
MQI delta vs prior week+0.1226
Top session: output tokens15,630
Top session: input tokens4
brandhouse_ppt commit additions6,899
skl-engine commit additions822

Changelog

260507: Generated by journalize-weekly (topic-first format, v2 regeneration)

Rewrote from per-project format to topic-first (sparse). Primary: 100% cache-hit session. Mixins: batch-merge telemetry blind spot + informal vs tracked experiments. Stripped private project refs from frontmatter and body.