IRI canonicalization: query all records first, apply selection logic second

2026-03-24
Signal
The IRI canonicalization bug: where filtering by integration=internal-tourism-api missed 73 entities that needed different integration paths: reveals that data migration scripts must query for ALL records first, then apply selection logic, not filter at query time.
Evidence
- Project: internal audit: IRI Canonicalization: 926/926 places resolved (100%); 853 → dedup group IRIs, 73 → self-canonical
- Bug found: Original script filtered
integration=internal-tourism-api(8,427 entities), missed 73 entities. Fixed to query ALL integrations (19,166 entities) then apply selection logic - Deliverables: Layer 6 Quality Gate (
scripts/final-quality-gate.ts) and final deliverable (client-deliverable-events-final.zip) built - Project: projects/oil/_index: v17.2 convergence: 5 rounds of quant audit; OOS R²=0.903, one-step MAE=$1.21, direction accuracy 90.5%; frozen parameters: demTh1=115, supPow=0.54
- Project: projects/jobs-apply/_index: 6 interactive sessions; jobs-apply website build, LinkedIn runs, Porkbun DNS setup for jobs-apply.ai domain; 148 automated code-review sessions
So What (Why Should You Care)
The 73 missing entities in the IRI canonicalization are a textbook example of filter-at-query vs. filter-in-application. Querying only integration=internal-tourism-api embedded a business logic assumption directly into the database query. The assumption was: “all entities that need canonicalization have the internal-tourism-api integration label.” That assumption was wrong for 73 entities that had different integration paths.
The fix is always the same pattern: query everything that might be relevant, then apply selection logic in application code where it’s explicit, auditable, and testable. Database queries are not the right place for business logic. They’re not testable in isolation, they’re harder to version control, and they fail silently when the underlying assumption they encode becomes false.
The scale of the miss matters here too. 73 entities out of 926 total is 7.9%: not a rounding error. A 92.1% complete migration would have shipped a deliverable with nearly 1 in 12 places having the wrong IRI format. The Layer 6 Quality Gate built today (scripts/final-quality-gate.ts) is specifically designed to catch this class of near-miss before it becomes a shipped defect.
The projects/oil/_index v17.2 convergence story is worth reading alongside the IRI work. Both reached the same milestone today: a defined problem was solved to its practical limits. The oil model hit “practical convergence” at OOS R²=0.903: the point where additional parameter tuning produces diminishing returns. The IRI canonicalization hit 100% coverage: the point where the problem is solved and the remaining work is validation, not implementation. Knowing when to stop is as important as knowing how to start.
The 6 interactive sessions on projects/jobs-apply/_index (website, LinkedIn runs, DNS) running in parallel while the other two projects converged demonstrates the value of having independent workstreams. Each project was at a different phase: oil was converging, IRI was completing, and jobs-apply was deploying infrastructure. None of them blocked the others.
What’s Next
- Validate final deliverable against Layer 6 Quality Gate
- Jobs-apply.ai DNS propagation: verify domain resolves
Log
- internal audit: IRI Canonicalization complete: 926/926 places resolved (100%)
- 853 → dedup group IRIs, 73 → self-canonical
- Bug found: original script filtered integration=internal-tourism-api (8,427 entities), missed 73 self-canonical entities
- Fix: query ALL integrations (19,166 entities), apply selection logic in application code
- Built Layer 6 Quality Gate (scripts/final-quality-gate.ts)
- Built final deliverable (client-deliverable-events-final.zip)
- projects/oil/_index: v17.2 convergence: 5 rounds of quant audit
- Final fixes: pmDamp bug, PM saturation, deEsc bounce, SPR Gaussian
- Frozen parameters: demTh1=115, supPow=0.54: model at practical convergence
- OOS R²=0.903, one-step MAE=$1.21, direction accuracy 90.5%
- projects/jobs-apply/_index: 6 interactive sessions: website build (SaaS dashboard)
- LinkedIn monitoring and application runs
- Porkbun DNS setup for jobs-apply.ai domain
- Continued event pipeline work from prior session (Batch 010)
- 148 automated code-review sessions