Guardrail architectures combining hard-block hooks with memory rules a
hook-hard-block + memory-rule achieves 100% reduction (N=1). Memory-rule-only achieves 33% resolution rate (1/3). 93% of
Hypothesis-driven tests with systematic execution. Each experiment documents what was tried, what surprised us, and what to try next.
hook-hard-block + memory-rule achieves 100% reduction (N=1). Memory-rule-only achieves 33% resolution rate (1/3). 93% of
57 scorecards across 32 systems. Rate-limit pitfall discovered: dispatching 53 subagents in one wave produced 0/53 score
Approach C delivered: 711 frames indexed, hybrid search working, 72 scanners running, 86% audit pass rate on first run.
Screenshot automation catches rendering bugs that unit tests miss. 2x retina resolution reveals sub-pixel issues. Rollin
Pipeline operational with all 4 sources. Sub-project merging eliminates double-counting. Proportional distribution preve
Canvas 2D eliminated WebGL context issues, simplified the rendering pipeline, and enabled L-system botanical encoding th
6/6 = 100% after select fix deployed. 9/13 total for the day (3/7 before fixes, 6/6 after). From Run 1 (40%) to Run 10 (
2/3 = 67% during quiet hours. 1 failure: 240s timeout on screening question dropdown (Databricks/Pyspark). CDP select de
All three failure patterns addressed and deployed. F40 scroll fix prevents modal-bottom submit button from being out of
All 6 subagents complete. LinkedIn scan revealed 0 interview signals from 105 submitted applications. All 12 recruiter m
5/6 = 83% success. All 6 attempts opened modal on strategy 1 (first try). 1 failure: Save Application dialog blocked adv
11 new table columns. Priority scoring operational. Direct channel submission rate stable at 77.9% (113/145). Cross-chan
System fully operational. 18 events verified in Neon via browser test. 8 experiments running. Consent gate unified. Thre
592 tests passing. Score audit table logs ALL scores including sub-70 rejections. 3,742 legacy scores backfilled from ar
Account unrestricted since 2026-03-26. Run 7 (2026-03-29) achieved 83% with all anti-detection measures active. P0-P1 (i
CDP mode disables ALL stealth scripts, relying entirely on real Chrome session. 22-38s click gaps were pure LLM inferenc
7 runs, 40% to 83% final. 26 individual failure fixes (F1-F39). Account restriction in Run 6 was the critical learning.
679 sessions (522 main + 157 worktree) in 2 days. Agents successfully parallelized adapter work. Merged into baseline co
Zero consumer changes needed. Rate limiting invisible inside provider layer. Prevented cost blowouts during early develo
Vision approach worked for page understanding but was eventually replaced by DOM-based extraction for form filling. Visi
16/16 pass (14 + 2 conditional skips). Two product bugs found and fixed: destroy-rebuild in MascotBar races Playwright l
14.8k total lines of Rust. All 3 crates compile clean. Ownership model eliminates PTY race conditions by design. Phase 4
Full terminal fidelity achieved via PTY-based execution. XState v5 state machines manage agent lifecycle. WebSocket bina
Fixed spawn order eliminates race conditions. Users can dynamically scale from 1-4 agents. Each role has distinct capabi
Fixed spawn order eliminates race conditions in process initialization. WS protocol enables real-time status updates. UI
260414b final. Three-layer fix: (1) Calibration-proper scoring eliminated Goodhart gaming (calPen 0.216->0.060). (2) Pro
R-squared=0.9752, MAPE=1.49%, OOS R-squared=0.9755. One-step-ahead: MAE=$0.833, RMSE=$1.198, direction accuracy=93.8%. B
Lowest one-step MAE: $0.833 (13% better than v14b). Direction accuracy 93.8% (15/16). Sell model: 100% trigger accuracy,
v17.2 current. Cycle 3 example: R² 0.9610→0.9769 (+0.0159), MAPE 1.85%→1.26% (-0.59pp). 12 parameters accepted, 3 reject
R² 0.9684 (+0.68pp). MAPE 1.57% (-0.32pp). One-step MAE $1.09 (-$0.54, 33% improvement). Direction accuracy 86.7%→93.3%
MAE $0.951 (42% improvement from v13's $1.633). Autocorrelation 0.1651→-0.001 (eliminated). OOS R² 0.9699. P($150) eleva
R² 0.9498→0.9616 (+1.18pp). MAPE 1.97%→1.89%. OOS R² 0.8633→0.9131 (+4.98pp). P($120+) 15.4%→36.4%. Historical pass rate
Orphaned notes eliminated. All session-generated notes now land in correct vault directories. The _obsidian_dir() mappin
286 sessions indexed from JSONL backfill. Cursor-based processing enables incremental updates without reprocessing. Sess
Per-write processing was too expensive (confirmed hypothesis partially wrong). Accumulate-then-flush pattern solved it:
PR #991 merged same-cycle with zero core patches required. The Zalo messaging adapter: a protocol most North American de
308 commits in 60 minutes across 4 sessions ($2.76). The gateway shipped and was immediately exercised by the sandbox ex
Native app achieved 99.2% uptime over 14 days vs 91.4% for headless. Eliminated 3 failure classes: GPU rendering crashes
Heartbeat detected phone-offline state an average of 6.2 hours before token expiration. Zero false negatives over 21-day
Auto-recovery handled 94% of disconnections without human intervention. Remaining 6% were auth token expirations requiri