Experiments

Hypothesis-driven tests with systematic execution. Each experiment documents what was tried, what surprised us, and what to try next.

Confirmed
34
72% success rate
Refuted
0
Pending
12
Total
47
across 10 projects

peon-notify (6)

dakka (5)

jobs-apply (17)

Experiment pending jobs-apply Apr 4, 2026

A pre-flight key validation step at server start that tests each API k

Experiment pending jobs-apply Apr 4, 2026

Applying the Karpathy ratchet methodology to jobs-apply's interview co

Experiment pending jobs-apply Apr 4, 2026

Extracting the gaussian behavioral timing module from jobs-apply into

Experiment confirmed jobs-apply Apr 2, 2026

Deploying 10 targeted fixes across easy-apply.ts, linkedin-adapter.ts,

6/6 = 100% after select fix deployed. 9/13 total for the day (3/7 before fixes, 6/6 after). From Run 1 (40%) to Run 10 (

Experiment confirmed jobs-apply Apr 2, 2026

Proactive modal scrolling and CDP select placeholder detection will re

2/3 = 67% during quiet hours. 1 failure: 240s timeout on screening question dropdown (Databricks/Pyspark). CDP select de

Experiment confirmed jobs-apply Apr 1, 2026

Fixing silent modal failures (F40), verification false negatives (F41)

All three failure patterns addressed and deployed. F40 scroll fix prevents modal-bottom submit button from being out of

Experiment inconclusive jobs-apply Mar 29, 2026

A 6-subagent Karpathy ratchet targeting interview conversion rate can

All 6 subagents complete. LinkedIn scan revealed 0 interview signals from 105 submitted applications. All 12 recruiter m

Experiment confirmed jobs-apply Mar 29, 2026

A click-and-verify loop that checks modal state after each click strat

5/6 = 83% success. All 6 attempts opened modal on strategy 1 (first try). 1 failure: Save Application dialog blocked adv

Experiment confirmed jobs-apply Mar 29, 2026

Database-backed company intelligence with priority scoring improves Di

11 new table columns. Priority scoring operational. Direct channel submission rate stable at 77.9% (113/145). Cross-chan

Experiment pending jobs-apply Mar 28, 2026

A/B testing marketing page copy and layout will improve waitlist conve

Experiment running. Only 3/10 pages instrumented (30%). Planned: 10 iterations covering CTA variants, flow tracking, dem

Experiment confirmed jobs-apply Mar 26, 2026

A three-layer scoring system (semantic embeddings + structured feature

592 tests passing. Score audit table logs ALL scores including sub-70 rejections. 3,742 legacy scores backfilled from ar

Experiment confirmed jobs-apply Mar 25, 2026

A multi-iteration anti-detection suite (gaussian timing, reading simul

Account unrestricted since 2026-03-26. Run 7 (2026-03-29) achieved 83% with all anti-detection measures active. P0-P1 (i

Experiment confirmed jobs-apply Mar 22, 2026

Connecting to the user's real Chrome via CDP will be more resistant to

CDP mode disables ALL stealth scripts, relying entirely on real Chrome session. 22-38s click gaps were pure LLM inferenc

Experiment confirmed jobs-apply Mar 15, 2026

Systematic fix of individual failure points will drive LinkedIn Easy A

7 runs, 40% to 83% final. 26 individual failure fixes (F1-F39). Account restriction in Run 6 was the critical learning.

Experiment confirmed jobs-apply Mar 9, 2026

Deploying 4 parallel Claude Code agents in isolated git worktrees can

679 sessions (522 main + 157 worktree) in 2 days. Agents successfully parallelized adapter work. Merged into baseline co

Experiment confirmed jobs-apply Feb 27, 2026

A provider-transparent rate limiter keyed by API key + provider can pr

Zero consumer changes needed. Rate limiting invisible inside provider layer. Prevented cost blowouts during early develo

Experiment confirmed jobs-apply Feb 26, 2026

Gemini Flash vision can analyze ATS page screenshots to extract form s

Vision approach worked for page understanding but was eventually replaced by DOM-based extraction for form filling. Visi

redcorsair (3)

bloomnet (5)

oil (7)

Experiment pending oil Apr 4, 2026

Google's TimesFM foundation model, applied to the sequence of DQI/Brie

Experiment confirmed oil Mar 18, 2026

A 3-round quantitative audit with one-step-ahead forecasting, multi-se

R-squared=0.9752, MAPE=1.49%, OOS R-squared=0.9755. One-step-ahead: MAE=$0.833, RMSE=$1.198, direction accuracy=93.8%. B

Experiment confirmed oil Mar 18, 2026

An 11-trigger sell model with conviction hold override plus refined de

Lowest one-step MAE: $0.833 (13% better than v14b). Direction accuracy 93.8% (15/16). Sell model: 100% trigger accuracy,

Experiment confirmed oil Mar 18, 2026

An 8-layer geopolitical consensus pipeline with tier-based parameter c

v17.2 current. Cycle 3 example: R² 0.9610→0.9769 (+0.0159), MAPE 1.85%→1.26% (-0.59pp). 12 parameters accepted, 3 reject

Experiment confirmed oil Mar 17, 2026

Autoregressive error correction and spike reversion mechanics will eli

R² 0.9684 (+0.68pp). MAPE 1.57% (-0.32pp). One-step MAE $1.09 (-$0.54, 33% improvement). Direction accuracy 86.7%→93.3%

Experiment confirmed oil Mar 17, 2026

Sigmoid probability blending (vs binary threshold) combined with expan

MAE $0.951 (42% improvement from v13's $1.633). Autocorrelation 0.1651→-0.001 (eliminated). OOS R² 0.9699. P($150) eleva

Experiment confirmed oil Mar 16, 2026

Tail risk is vastly underestimated : adding historical analogues and

R² 0.9498→0.9616 (+1.18pp). MAPE 1.97%→1.89%. OOS R² 0.8633→0.9131 (+4.98pp). P($120+) 15.4%→36.4%. Historical pass rate

context-curator (1)

quick-fin (1)

kiro-cli-factory (1)

pirate-ship (1)