Journal

Audit pipeline iteration: light scoring adjustments

March 19, 2026

scoringiteration

2026-03-20

Signal

Continued audit pipeline work with slightly more engagement than the previous day. 22 sessions with a few tool calls, which suggests minor adjustments to scoring parameters or pipeline configuration. The two-day pattern of monitor-then-adjust is a textbook ratchet cadence: observe one day, tune the next, and let the small increments compound.

Evidence

22 pipeline sessions logged. 5,611 tokens in total, 4 tool calls. The small token count combined with the handful of tool calls is the signature of configuration tweaks rather than feature work. A real feature day runs into the tens of thousands of tokens and hundreds of tool calls. A real firefight does too. Four tool calls across 22 sessions is the profile of someone opening a config, reading a value, changing one number, and moving on.

This follows yesterday’s monitoring day, where I logged 15 sessions with zero tool calls. The sequence monitor -> adjust -> monitor is the natural loop. You look at what the last run produced, you pick one dimension that looks off, you move one parameter, and you run again. The small size of today’s tool call budget suggests I moved one thing, maybe two. The right number for a ratchet iteration.

So What

The two-day pattern of monitor on day one and adjust on day two is the Karpathy ratchet in practice. The daily increments are small enough that any one of them looks like nothing, but they compound. Optimization loops that try to do ten things at once produce a change you cannot interpret. Did the scoring improve because of the threshold tweak, or because of the feature weight shift, or because of the new filter? You have no way to know. A one-change-per-day cadence gives you a clean signal every time.

The engineering lesson embedded in this day is that iteration speed beats iteration size for quality metrics. A week of one-change-per-day beats a single big-bang refactor almost every time, because the big-bang requires you to hold many hypotheses at once and debug them all together. Small changes let you debug one hypothesis at a time and keep a running log of which worked.

The risk of this cadence is that you plateau at a local optimum. Small changes only explore small distances in parameter space. Every few weeks you need to step back and ask whether the metric you are ratcheting on is still the right metric. Otherwise you can optimize yourself into a dead end.

What’s Next

Tomorrow the loop continues: either another monitoring day to see whether the adjustment moved the metric, or another adjustment if the result came back fast enough. The useful discipline is to write down the hypothesis before the change lands, so that post-run I can compare expected versus actual. Without that, the ratchet is just noise. With it, each day produces a small piece of real evidence.

I also want to start keeping a tally of which adjustments produced measurable wins versus which produced nothing. Over ten iterations that tally becomes a decision tree for the next pipeline I build.

The part of this day I most want to preserve as a pattern: the willingness to make a single small change and stop. In agent-driven workflows, the temptation is always to pile on. You have the agent, you have context loaded, you can make five changes as easily as one, so why not? The answer is that you cannot interpret the result afterward. One change per iteration is a discipline that pays dividends later, because when the metric moves you know exactly why, and when it does not you know exactly which hypothesis to reject. Every time I violate that rule I end up with a mystery and have to unwind the changes one by one to figure out which did what. The slow path is the fast path.

Log

Sessions: 22 pipeline
Tokens: 5,611 total
Tool calls: 4
Commits: 0
Mode: configuration tweaks
Prior day: 15 monitoring sessions with 0 tool calls