Hook Pipeline Data Integrity

What Happened
Full audit of Claude Code hook infrastructure revealed a chain of data integrity failures across three pipelines:
-
Failure manifest ANSI poisoning:
failure-manifest-append.shcaptured raw terminal output with ANSI escape codes (\x1b[0;31m) embedded in JSON string values. JSON spec forbids control characters below 0x20 in strings. 14 entries were malformed, blocking the entire flush pipeline (49 entries stuck unflushed). -
Failure manifest line concatenation: Some entries lacked newline separators, producing two JSON objects on one line. The JSON decoder then saw “extra data” after the first closing brace.
-
Public Lab build broken silently: 52 definition files synced from vault lacked the required
categoryfield. Astro’s Zod schema validation failed every build. Additionally, 12 experiment files had YAMLnullfor optional string fields; Zod.optional()means “key absent,” not “value is null.” -
Dead project routing:
failure-flush.shroutedpeon-notifyentries to the archivedPeonNotifydirectory instead ofrusty-bloomnet. 12+ task files landed in a dead path.
Root Cause
Pipeline boundary hygiene: Each failure is a boundary violation:
- Shell output (ANSI codes) leaked into a JSON data store
- YAML null semantics leaked into a Zod string schema
- A project rename (PeonNotify to stella/rusty-bloomnet) wasn’t propagated to the routing table
All three share a pattern: data crossed a boundary without sanitization or validation at the boundary.
Fix Applied
failure-manifest-append.sh: Strip ANSI codes (sed $'s/\x1b\[[0-9;]*[a-zA-Z]//g') and control chars before JSON serialization- Manifest repair script: JSON decoder walks each line, splits concatenated objects, strips control chars
- Public Lab: Added
categoryfield to all 52 definitions via auto-classification from tags - Experiment frontmatter: Removed YAML
nulllines (Zod treats absent key as undefined, which is what.optional()wants) failure-flush.shrouting was already updated for stella migration (verified:peon-notify|peonnotify|peonmaps to rusty-bloomnet)
Additional Findings
failure-detect-e2e.shhad unbound$jobsvariable underset -u(bashdeclare -Adoesn’t initialize in all shells)- 93 stale codeguard state directories accumulating (cleaned 51, 42 remain from active sessions)
- Temp diagnostic script (
/tmp/stella-diag.sh) left insettings.local.json(removed) - Watchdog log showed “No space left on device” errors from a past disk pressure incident
obsidian-cron.loguses GNUhead -zwhich doesn’t exist on macOS BSD
Redemption
49 stuck failure-manifest entries flushed successfully after repair. Public Lab builds passing (192 pages, 3.12s). All hooks tested.
Key Lesson
Validate at every data boundary. Shell output is not JSON-safe. YAML null is not JavaScript undefined. A project rename is a schema migration. The accumulate-then-flush pattern works brilliantly when the accumulator produces valid records; when it doesn’t, the entire pipeline silently stops.