Guardrail architectures combining hard-block hooks with memory rules achieve higher reduction rates than memory rules alone, and the majority of catalogued failure modes lack observability data needed to measure effectiveness
hook-hard-block + memory-rule achieves 100% reduction (N=1). Memory-rule-only achieves 33% resolution rate (1/3). 93% of failure modes have no observa
HypothesisGuardrail architectures combining hard-block hooks with memory rules achieve higher reduction rates than memory rules alone, and the majority of catalogued failure modes lack observability data needed to measure effectiveness
hook-hard-block + memory-rule achieves 100% reduction (N=1). Memory-rule-only achieves 33% resolution rate (1/3). 93% of failure modes have no observability data. 12 always-on hooks contribute latency on every tool call.
Changelog
| Date | Summary |
|---|---|
| 2026-05-05 | Initial experiment: full pipeline executed, dashboard generated, 150 studies written |
Hypothesis
We bet that multi-layer guardrail architectures (hook + memory-rule combinations) outperform single-layer interventions (memory-rule alone) at preventing recurrence of agentic failure modes. The reasoning: memory rules depend on the model’s attention and compliance: they are suggestions. Hard-block hooks provide an enforcement ceiling that fires regardless of model reasoning. The combination gives both “soft nudge” and “hard wall.”
Secondary hypothesis: the majority of our 150+ catalogued failure modes lack the observability data required to measure effectiveness, meaning we are flying blind on most of our guardrail investment. Without incident telemetry, we cannot distinguish “genuinely rare” from “not instrumented.”
The topics/process-guardrail-taxonomy defines the intervention types. The guardrails/_index catalogs all known failure modes. This experiment tests whether the taxonomy’s layering theory holds empirically.
Method
Design: Interrupted time series (ITS): each guardrail deployment is an intervention point. Rate is computed as occurrences / sessions in each period (before vs after intervention).
Pipeline stages:
-
Session Census (
stage1_session_census.py): Parse~/.claude/history.jsonlfor daily session counts. Output: 100 days, 771 session-days. -
Failure Mode Extraction (
stage2_failure_modes.py): Deduplicate failure modes from memory system (75 files) + vault pitfalls (81 files) usingSequenceMatcher > 0.6. Output: 150 unique failure modes. -
Intervention Timeline (
stage3_interventions.py): Reconstruct when each guardrail was deployed from git history + file creation dates. Classify type (hook-hard-block, hook-advisory, hook-telemetry, memory-rule, policy-config) and activation pattern (always-on, session-boundary, process-triggered, event-reactive, passive). Output: 123 interventions. -
Incident Reconstruction (
stage4_incidents.py): Parse guard JSONL logs (blocked attempts), vault pitfall incident arrays (occurrences), memory narrative incidents. Output: 131 incidents (59 blocked, 72 occurred). -
Effectiveness Computation (
stage5_effectiveness.py): Join failure modes + interventions + incidents + session census. Compute per-period rates. Determine status (resolved/mitigated/open/no-data). Output: 150 effectiveness scores.
Generators:
generate_studies.py: One vault study entry per failure mode (150 files)generate_dashboard.py: Aggregate 5-section dashboard
Validation: 24 unit tests + full e2e pipeline run.
Results
Status Distribution (N=150)
| Status | Count | % |
|---|---|---|
| no-data | 140 | 93.3% |
| open | 7 | 4.7% |
| resolved | 3 | 2.0% |
| mitigated | 0 | 0.0% |
Architecture Pattern Performance
| Pattern | Avg Reduction | Resolved |
|---|---|---|
| hook-hard-block + memory-rule | 100.0% | 1/1 |
| hook-advisory + hook-telemetry + memory-rule + policy-config | 100.0% | 1/1 |
| memory-rule only | 0.0% | 1/3 |
| hook-hard-block + memory-rule + policy-config | 0.0% | 0/1 |
Key Numbers
- 771 sessions audited (100 days)
- 150 failure modes catalogued
- 123 interventions mapped (32 unmapped)
- 131 incidents reconstructed
- 12 always-on hooks (latency on every tool call)
- 1 pruning candidate (ralph-loop: never fired)
Findings
1. Multi-layer > Single-layer (Hypothesis Confirmed)
hook-hard-block + memory-rule achieves 100% reduction. The hard-block provides the enforcement ceiling that memory rules cannot guarantee. The Gmail MCP guard is the strongest example: 5 occurrences in 690 sessions (baseline 0.0072), zero in 81 sessions post-guard, with 4 blocked attempts proving the hook is load-bearing.
2. Memory Rules Alone Are Necessary But Insufficient
Memory rules are the dominant intervention type (75 of 123 interventions, 61%). They achieve 33% resolution rate when used alone (1/3 resolved). For zero-baseline failure modes, their value is preventing first-occurrence: which the data cannot falsify. They are the cheapest intervention but provide no enforcement guarantee.
3. The Coverage Gap Is the Real Problem
93% of failure modes are in no-data state. We cannot distinguish “genuinely rare” from “not instrumented.” The pipeline itself is the first step in closing this gap: now we know which modes to instrument next.
4. Smoke Detector Paradox
guard-db-destruction shows 0 baseline + 0.0523 current rate. This is NOT regression: the hook surfaced a latent failure that wasn’t previously measured. The high post-deployment rate is evidence the guard is working, not failing.
5. One Pruning Candidate
The Ralph loop termination guard has never fired despite active interventions. Recommend demoting to advisory or removing: the overhead isn’t justified by zero incidents.
Next Steps
- Instrument high-severity no-data modes: Add session-tagging for the 7 open + critical/high no-data failure modes to confirm true absence vs measurement gap
- Add timing data to latency budget: Parse
bloomnet-hooks.jsonlfor per-hook p50/p95/p99 latency to identify optimization targets - Promote the db-destruction pattern: Deploy hook-hard-block + memory-rule for remaining open/critical failure modes (6 candidates)
- Prune ralph-loop guard: Demote to advisory after 90 more sessions with zero incidents
- Automate re-runs: Add cron job to re-run pipeline weekly and diff against previous dashboard
- Close the 32 unmapped interventions: Map remaining memory rules to failure modes or retire them