Experiment Preferences jobs-apply

A 6-subagent Karpathy ratchet targeting interview conversion rate can identify and fix the bottleneck between application submission and interview scheduling

All 6 subagents complete. LinkedIn scan revealed 0 interview signals from 105 submitted applications. All 12 recruiter messages were cold InMail, not

March 28, 2026

interview-rateratchetkarpathycareer

Hypothesis

A 6-subagent Karpathy ratchet targeting interview conversion rate can identify and fix the bottleneck between application submission and interview scheduling

Result: inconclusive

Key Findings

All 6 subagents complete. LinkedIn scan revealed 0 interview signals from 105 submitted applications. All 12 recruiter messages were cold InMail, not responses. Interview rate from LinkedIn = 0%. Focus shifted to Gmail.

Changelog

Date	Summary
2026-04-06	Audited: added Changelog, domain tag career, stamped last_audited
2026-03-29	Initial creation

Hypothesis

A 6-subagent [Karpathy [definitions/karpathy-ratchet|ratchet]] targeting interview conversion rate can identify and fix the bottleneck between application submission and interview scheduling. The system has submitted 384 LinkedIn applications (105 successful), 113 Direct applications, and 115 Greenhouse applications, but interview conversion is unmeasured. By deploying 6 specialized subagents to instrument, measure, and optimize each stage of the post-submission funnel, the interview rate should become visible and improvable.

Method

Six subagents were deployed on 2026-03-29, each with a 10-iteration budget and a specific focus area.

SA1: Email-to-application cross-reference

Problem: The checkGmailResponses function existed but was dead code: never wired into any pipeline
Action: Analyzed the function signature and integration points, mapped email patterns (confirmation, rejection, interview request) to application records
Iterations used: 4/10
Outcome: Identified the wiring gap; function needs OAuth token refresh (see reference_gmail_oauth.md) and matching logic to connect email subjects/senders to submitted application company names

SA2: LinkedIn message scanning

Problem: No visibility into LinkedIn message responses to applications
Action: Extracted and classified all LinkedIn conversations using [[definitions/chrome-devtools-protocol|CDP]] DOM access
Data: 24 total conversations extracted, 12 classified as recruiter messages, 12 as other (connection requests, content notifications)
Key finding: All 12 recruiter messages were cold InMail outreach (recruiters reaching out proactively), NOT responses to submitted applications
Iterations used: 3/10
Outcome: LinkedIn interview signal from Easy Apply = 0 out of 105 submissions

SA3: Channel attribution

Problem: No per-channel interview tracking existed
Action: Designed schema for tracking interview outcomes by channel (LinkedIn, Direct, Greenhouse, Indeed)
Iterations used: 2/10
Outcome: Schema designed but no interview data to populate. Created the tracking infrastructure for future use.

SA4: J-Score calibration

Problem: No feedback loop between interview outcomes and J-Score predictions
Action: Analyzed score band distribution across submitted applications
Score bands: 70-75 (12% of submissions), 75-80 (28%), 80-85 (35%), 85-90 (18%), 90+ (7%)
Iterations used: 3/10
Outcome: Without interview outcome data, calibration is impossible. Documented the score distribution baseline for future comparison when outcomes become available.

SA5: Website funnel optimization

Problem: Waitlist conversion rate unmeasured
Action: Deployed hero-copy-v1 A/B experiment (50/50 split, control vs. variant-a)
10-page funnel mapped: ToFu (/about, /changelog, /security), MoFu (/how-it-works, /demo, /contact), BoFu (/, /pricing, /auth/signin)
Iterations used: 2/10
Outcome: A/B test running but only 3/10 pages instrumented (30%). See experiments/jobs-apply/2026-03-28-website-funnel-ab-testing for details.

SA6: Unified experiment migration

Problem: 7 different experiment tracking formats across the project (JSONL, markdown, YAML, inline code comments, git commit messages, dashboard notes, spreadsheet)
Action: Migrated all experiment records to a unified experiments.jsonl format
Iterations used: 4/10
Outcome: 35 entries consolidated in experiments.jsonl from 7 source formats

Results

Inconclusive. The primary hypothesis: that the ratchet can identify and fix the bottleneck: was partially validated. The bottleneck was identified (zero interview signal from LinkedIn, unmeasured signal from email), but fixing it requires infrastructure (Gmail OAuth wiring, email pattern matching) that was not completed within the single-day experiment window.

The most significant finding was negative: 105 LinkedIn Easy Apply submissions produced zero detectable interview responses. All recruiter contact on LinkedIn was unsolicited cold outreach, not application responses.

Findings

LinkedIn Easy Apply has near-zero interview conversion for this profile. 105 submissions, 0 interview signals. This could mean: (a) responses come via email not LinkedIn messages, (b) the application quality/targeting is poor, (c) Easy Apply applications are deprioritized by recruiters, or (d) the sample period is too short. Most likely a combination of (a) and (c).
Email is the missing measurement layer. Most companies send interview invitations via email, not LinkedIn messages. Until the Gmail cross-reference (SA1) is wired up, the interview rate is literally unmeasurable. This is the single highest-priority infrastructure gap.
Cold InMail creates measurement noise. 12 of 24 LinkedIn messages were recruiter cold outreach, which could be confused with application responses without careful classification. The CDP-based extraction correctly distinguished these, but any automated system needs this classification logic.
Score distribution suggests over-permissive threshold. 12% of submissions scored 70-75, which is just above the floor. If these borderline submissions have zero interview conversion (plausible), raising MATCH_SCORE_FLOOR to 75 would reduce volume by 12% while potentially improving average application quality.
Experiment format fragmentation is a real problem. 7 formats across a single project made it impossible to query experiment history programmatically. The consolidation to experiments.jsonl (35 entries) was overdue.

Next Steps

The critical path is wiring up Gmail OAuth and the checkGmailResponses function to close the measurement gap. Without email-based interview tracking, all optimization is blind. Secondary priority is raising the score floor from 70 to 75 as an experiment once interview data is flowing. The website funnel work continues independently in experiments/jobs-apply/2026-03-28-website-funnel-ab-testing.