Journal

The Rename That Found a Real Bug

reviewtechsparse

87 sessions and $159.52 across four active days with zero commits. A rename uncovered a production-class concurrency bug that 112 automated reviews and months of testing had not found. The delta: reviews without a hypothesis are noise; a rename forces a hypothesis.

The Rename That Found a Real Bug

The week’s central event was renaming “Hunt Mode” to “Run” across the codebase. 112 automated CDP-assisted review sessions fired against every file the rename touched. The results were not what a cosmetic change would produce.

The old name implied open-ended roaming behavior: hunts share territory. The new name implies a bounded, isolated execution. Auditing whether the code matched the new mental model revealed that the pipeline was sharing state between concurrent runs. That is Hunt behavior. Runs are isolated executions by definition.

The bug had survived months of testing because tests were written when “Hunt” seemed correct. No test asked “do two simultaneous runs share state?” because the mental model didn’t include isolation as a requirement. Renaming forced the question into the open.

The fix required per-run state isolation and per-worker log files for independent auditability of parallel execution. Worker stats display bugs were found and fixed in the same pass. None of this would have surfaced from a test suite that hadn’t updated its model of what the system was supposed to do.

This generalizes. When you rename something and the surrounding code resists: method names that no longer fit, comments that contradict, test assertions that pass for the wrong reason: that resistance is signal. The code is telling you it was designed for a different abstraction than the one you’re now naming.

The 112-session automated review is a separate question. Reviews at scale are only valuable when they have a specific hypothesis. “Does this code still make sense under the new name?” is a hypothesis. “Review everything that changed” is not. The former surfaces bugs; the latter produces suggestions that match the old mental model.

Session count on the rename day (28 sessions, matching the prior two days exactly) suggests a cap or batch limit rather than organic demand. The uniform daily session count of 28 across three consecutive days (Tue-Thu) is worth tracking as a tool-side pattern.

Transferable insight: Renaming is an audit. When the code resists the new name, you’ve found a design assumption that no longer holds: that’s a bug, not a naming conflict.

The 403 That Looked Like Missing Data

Mixin: federated API error masquerade

An entity type IRI prefix mismatch in a tourism data API returned HTTP 403, not 404. The caller concluded the data didn’t exist. The data existed; the request was malformed.

The deceptive part: the /api/count endpoint accepted either prefix and succeeded. The /api/findAll endpoint enforced exact IRI matching and returned 403 on mismatch. Entity counts appeared correct. Entity fetches failed silently. The inconsistent validation behavior across endpoints is what made this hard to diagnose: a caller who verifies access by checking counts will conclude they have permission, then hit 403 on the actual data fetch.

The fix required using the correct IRI prefix and adding prefix detection to pipeline initialization. The pitfall note also documented four canonical @id formats for place entities and two formats that remain forbidden.

This class of error is particularly dangerous in federated systems where you don’t own the API contract. A permissions error that looks like a data availability error routes debugging effort to the wrong place for a long time.

Transferable insight: When one endpoint succeeds and another returns 403 with identical parameters, check whether the endpoints validate the same fields with the same strictness. Inconsistent validation across endpoints is a common API design flaw.

Lint Before AI Review

Mixin: tiered code quality gates

PeonNotify v0.2.0 shipped the CodeGuard pipeline: automatic lint plus AI debug review on every file write. The architecture decision is tiered gating: lint runs first, AI review only fires if lint passes.

Running LLM review on syntactically broken code wastes context window on obvious syntax errors rather than semantic bugs. The lint-before-AI gate eliminates that waste. Language coverage: JS/TS, Python, Shell, Go, Rust, Ruby, SQL via file-extension detection. Language-specific review prompts replaced a generic “review this code” instruction.

Three new sound categories shipped: never_mind (tool failure: recovery expected), leave_me_alone (system error: needs attention), more_gold (rate limit hit: external constraint, not internal failure). The semantic differentiation at the audio layer mirrors the semantic differentiation at the code quality layer: same design principle, different domain.

Transferable insight: The cheapest gate runs first. Deterministic checks should filter before non-deterministic ones; lint before LLM is the specific instance.

Zeitgeist

@heygurisingh
Google CodeWiki: Paste a GitHub URL and get an interactive guide, architecture diagrams, walkthroughs, and a code-aware chatbot
@jxmnop
Codex 100% accuracy, 343 parameters: Codex one-shots 100% accuracy on 10-digit addition with only 343 parameters using hand-set weights
@N8Programs
Hand-crafted weights beat trained transformer: Hand-crafted weights beat trained transformer on 10-digit addition: 343 params, 100% on 10M test cases

By the Numbers

MetricValue
Sessions87
Compute cost$159.52
Git commits0
Deploys0
MQI avg0.2188
MQI delta+0.0292
Cache hit rate99.7%
Avg cost/session$1.83
Active days4 of 7

Changelog

260507: Generated by journalize-weekly (topic-first format, v2 regeneration)