Experiments

Hypothesis-driven tests with systematic execution. Each experiment documents what was tried, what surprised us, and what to try next.

Confirmed
43
67% success rate
Refuted
0
Pending
12
Total
64
across 14 projects

bloomnet (9)

Experiment confirmed bloomnet May 4, 2026

Guardrail architectures combining hard-block hooks with memory rules a

hook-hard-block + memory-rule achieves 100% reduction (N=1). Memory-rule-only achieves 33% resolution rate (1/3). 93% of

Experiment pending bloomnet Apr 23, 2026

Adding Distribution and Semantic-Builders leagues plus factoring Matur

Experiment confirmed bloomnet Apr 19, 2026

Four-league structure (Harness/Vault/Skills/Tools) with one subagent p

57 scorecards across 32 systems. Rate-limit pitfall discovered: dispatching 53 subagents in one wave produced 0/53 score

Experiment confirmed bloomnet Apr 7, 2026

Building our own frame-graph storage in SQLite with tantivy BM25 and H

Approach C delivered: 711 frames indexed, hybrid search working, 72 scanners running, 86% audit pass rate on first run.

Experiment pending bloomnet Apr 3, 2026

Applying the Karpathy LLM Knowledge Base compilation pattern to a subs

Experiment pending bloomnet Apr 3, 2026

A rolling z-score anomaly detector (2-sigma threshold) on Claude Code

Experiment confirmed bloomnet Mar 24, 2026

Puppeteer-driven screenshot capture loops will catch visual regression

Screenshot automation catches rendering bugs that unit tests miss. 2x retina resolution reveals sub-pixel issues. Rollin

Experiment confirmed bloomnet Mar 19, 2026

A 4-source ingestion pipeline (JSONL sessions, stats-cache, history.js

Pipeline operational with all 4 sources. Sub-project merging eliminates double-counting. Proportional distribution preve

Experiment confirmed bloomnet Mar 14, 2026

Canvas 2D rendering will provide better performance and simpler archit

Canvas 2D eliminated WebGL context issues, simplified the rendering pipeline, and enabled L-system botanical encoding th

jobs-apply (23)

Experiment inconclusive jobs-apply Apr 27, 2026

ToFu visitors on About are ready for BoFu action without more mid-funn

Experiment inconclusive jobs-apply Apr 27, 2026

Changelog readers who see shipping momentum respond to urgency framing

Experiment inconclusive jobs-apply Apr 27, 2026

Contact page visitors respond better to direct download than gentle fu

Experiment inconclusive jobs-apply Apr 27, 2026

Concrete metrics build more credibility than broad feature claims

Experiment inconclusive jobs-apply Apr 27, 2026

More frequent CTAs mid-scroll catch visitors before bounce, shorter pa

Experiment inconclusive jobs-apply Apr 27, 2026

Leading with download CTA converts higher-intent visitors who'd bounce

Experiment inconclusive jobs-apply Apr 27, 2026

Lower price anchor reduces sticker shock, action-oriented copy outperf

Experiment inconclusive jobs-apply Apr 27, 2026

Trust-primed visitors who read the full security page are ready to con

Experiment pending jobs-apply Apr 3, 2026

Extracting the gaussian behavioral timing module from jobs-apply into

Experiment confirmed jobs-apply Apr 1, 2026

Deploying 10 targeted fixes across easy-apply.ts, linkedin-adapter.ts,

6/6 = 100% after select fix deployed. 9/13 total for the day (3/7 before fixes, 6/6 after). From Run 1 (40%) to Run 10 (

Experiment confirmed jobs-apply Apr 1, 2026

Proactive modal scrolling and CDP select placeholder detection will re

2/3 = 67% during quiet hours. 1 failure: 240s timeout on screening question dropdown (Databricks/Pyspark). CDP select de

Experiment confirmed jobs-apply Mar 31, 2026

Fixing silent modal failures (F40), verification false negatives (F41)

All three failure patterns addressed and deployed. F40 scroll fix prevents modal-bottom submit button from being out of

Experiment inconclusive jobs-apply Mar 28, 2026

A 6-subagent Karpathy ratchet targeting interview conversion rate can

All 6 subagents complete. LinkedIn scan revealed 0 interview signals from 105 submitted applications. All 12 recruiter m

Experiment confirmed jobs-apply Mar 28, 2026

A click-and-verify loop that checks modal state after each click strat

5/6 = 83% success. All 6 attempts opened modal on strategy 1 (first try). 1 failure: Save Application dialog blocked adv

Experiment confirmed jobs-apply Mar 28, 2026

Database-backed company intelligence with priority scoring improves Di

11 new table columns. Priority scoring operational. Direct channel submission rate stable at 77.9% (113/145). Cross-chan

Experiment confirmed jobs-apply Mar 27, 2026

Event-driven A/B testing with auto-promotion will produce measurable c

System fully operational. 18 events verified in Neon via browser test. 8 experiments running. Consent gate unified. Thre

Experiment confirmed jobs-apply Mar 25, 2026

A three-layer scoring system (semantic embeddings + structured feature

592 tests passing. Score audit table logs ALL scores including sub-70 rejections. 3,742 legacy scores backfilled from ar

Experiment confirmed jobs-apply Mar 24, 2026

A multi-iteration anti-detection suite (gaussian timing, reading simul

Account unrestricted since 2026-03-26. Run 7 (2026-03-29) achieved 83% with all anti-detection measures active. P0-P1 (i

Experiment confirmed jobs-apply Mar 21, 2026

Connecting to the user's real Chrome via CDP will be more resistant to

CDP mode disables ALL stealth scripts, relying entirely on real Chrome session. 22-38s click gaps were pure LLM inferenc

Experiment confirmed jobs-apply Mar 14, 2026

Systematic fix of individual failure points will drive LinkedIn Easy A

7 runs, 40% to 83% final. 26 individual failure fixes (F1-F39). Account restriction in Run 6 was the critical learning.

Experiment confirmed jobs-apply Mar 8, 2026

Deploying 4 parallel Claude Code agents in isolated git worktrees can

679 sessions (522 main + 157 worktree) in 2 days. Agents successfully parallelized adapter work. Merged into baseline co

Experiment confirmed jobs-apply Feb 26, 2026

A provider-transparent rate limiter keyed by API key + provider can pr

Zero consumer changes needed. Rate limiting invisible inside provider layer. Prevented cost blowouts during early develo

Experiment confirmed jobs-apply Feb 25, 2026

Gemini Flash vision can analyze ATS page screenshots to extract form s

Vision approach worked for page understanding but was eventually replaced by DOM-based extraction for form filling. Visi

dakka (6)

media-diet (2)

apple-photos (1)

oil (7)

Experiment confirmed oil Apr 13, 2026

Replacing uniform Brier reference ($90-$220) with market-implied midpo

260414b final. Three-layer fix: (1) Calibration-proper scoring eliminated Goodhart gaming (calPen 0.216->0.060). (2) Pro

Experiment confirmed oil Mar 17, 2026

A 3-round quantitative audit with one-step-ahead forecasting, multi-se

R-squared=0.9752, MAPE=1.49%, OOS R-squared=0.9755. One-step-ahead: MAE=$0.833, RMSE=$1.198, direction accuracy=93.8%. B

Experiment confirmed oil Mar 17, 2026

An 11-trigger sell model with conviction hold override plus refined de

Lowest one-step MAE: $0.833 (13% better than v14b). Direction accuracy 93.8% (15/16). Sell model: 100% trigger accuracy,

Experiment confirmed oil Mar 17, 2026

An 8-layer geopolitical consensus pipeline with tier-based parameter c

v17.2 current. Cycle 3 example: R² 0.9610→0.9769 (+0.0159), MAPE 1.85%→1.26% (-0.59pp). 12 parameters accepted, 3 reject

Experiment confirmed oil Mar 16, 2026

Autoregressive error correction and spike reversion mechanics will eli

R² 0.9684 (+0.68pp). MAPE 1.57% (-0.32pp). One-step MAE $1.09 (-$0.54, 33% improvement). Direction accuracy 86.7%→93.3%

Experiment confirmed oil Mar 16, 2026

Sigmoid probability blending (vs binary threshold) combined with expan

MAE $0.951 (42% improvement from v13's $1.633). Autocorrelation 0.1651→-0.001 (eliminated). OOS R² 0.9699. P($150) eleva

Experiment confirmed oil Mar 15, 2026

Tail risk is vastly underestimated: adding historical analogues and de

R² 0.9498→0.9616 (+1.18pp). MAPE 1.97%→1.89%. OOS R² 0.8633→0.9131 (+4.98pp). P($120+) 15.4%→36.4%. Historical pass rate

email-voice (1)

brand-voice (1)

peon-notify (6)

context-curator (1)

Kiro CLI Factory (1)

investor-research (1)

openclaw (2)

OpenClaw (3)