j-score-v2-matching
User needs to score how well a candidate matches a job posting using multi-layer fusion scoring
Changelog
260420: multiple edits
- v_migrate: Changelog migrated from table to YYMMDD H3 format per versioning-standard rule 2 (V1.6 of skills upgrade plan)
- v6: Added license, sources per V6.1/V6.2 of skills upgrade plan.
- v1.5: Added
## Quality Checkssection per V1.5 of ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md
260403: Added Visual Enrichment section + self-improving-agent-patterns cross-reference
260331: Initial creation
Description
Three-layer fusion scoring engine that replaces a legacy single-LLM scorer (which had 17 identified weaknesses: no semantic understanding, non-deterministic output, zero feedback loop, no skills graph, no score calibration). The core insight is that no single scoring method is sufficient for job matching: deterministic features miss semantic nuance, embeddings miss structured constraints, and LLMs are expensive and unreliable alone. Fusing all three, with confidence-weighted LLM contribution, produces scores that are both cheap and accurate.
Deployed 2026-03-26 in projects/jobs-apply/_index. All scores (including sub-70 rejections) are logged to the score_audit table for future topics/confidence-weighted-score-fusion calibration and topics/interview-rate-optimization feedback loops.
Layer 1: Semantic Embedder (30-50ms, $0): Xenova/all-MiniLM-L6-v2, 22MB quantized, 384-dim. Embeds job text and candidate text, cosine similarity mapped to 0-100. Catches semantic matches that keyword overlap misses (“ML Engineer” clusters near “AI Researcher”). Lazy singleton; graceful fallback to 50 (neutral) if load fails.
Layer 2: Structured Features (<1ms, $0): Five weighted sub-dimensions: Skills overlap (0.35, taxonomy-based: exact=1.0, sibling=0.8, parent=0.6, 10 categories, 200+ skills), Experience alignment (0.20), Location compatibility (0.20, metro area grouping), Salary fit (0.15), Tech stack alignment (0.10). Returns matchedSkills[] and skillGaps[] forwarded to Layer 3.
Layer 3: Enhanced LLM (~2s, ~$0.001): DeepSeek v3 via OpenRouter. Gated: only runs when Layer 2 >= 35 (validated on 2,874 historical jobs, zero false negatives, saves ~60% API cost). Receives Layer 2 sub-scores and skill gaps as context. Scores cultural fit, career trajectory, hidden requirements, domain nuance, and red flags. Returns score, confidence (low/medium/high), reasoning, key_matches[], concerns[]. Fails gracefully with weight redistribution.
Fusion: finalScore = 0.35 * L1 + 0.25 * L2 + 0.40 * L3. Confidence adjustment: low = 0.6x LLM weight, medium = 1.0x, high = 1.2x (before renormalization). No-LLM fallback: 0.55 / 0.45 semantic/structured. Floor: 70+ to proceed to tailoring.
Interface
import { JScoreV2, type JScoreOptions } from '@jobs-apply/ai';
const scorer = new JScoreV2(openRouterProvider, {
weights: { semantic: 0.35, structured: 0.25, llm: 0.40 },
llmThreshold: 35,
skipSemantic: false,
skipLlm: false,
});
// Single job
const result = await scorer.scoreJob(job, profile);
// result: { score, semantic, structured, llm, llmInvoked, reasoning, key_matches, concerns, weights, usage }
// Batch (Layer 1+2 for all, Layer 3 only for above-threshold)
const results = await scorer.batchScore(jobs, profile);
// results: Map<jobUrl, JScoreResult>
Audit logging is handled by ScoreAuditRepository.log() which persists every score (including rejections) with full layer breakdown, outcome tracking, and 5-point bucket distribution for calibration analysis.
Five-layer enforcement ensures no sub-70 job is ever submitted:
- Config:
server.tssetsminMatchScore: 70 - Hard floor:
match.tsexportsMATCH_SCORE_FLOOR = 70, clamps constructor arg - Auto-approve gate:
channel-run-loop.tschecks score before auto-approving - Submit guard:
submit.tsrejects sub-floor scores pre-submit - Junk filter:
discover.tsregex patterns block non-job listings
Provenance
Origin: 17-weakness audit (W1-W17) of the legacy single-LLM scorer from the archived autosearch system. Key failures: no semantic pre-filter caused false negatives on semantically-similar titles, non-deterministic LLM-only scoring produced inconsistent results across runs, zero feedback loop meant no learning from outcomes.
Historical validation: 3,742 legacy scores backfilled into score_audit from the archived system during the data import (2026-03-26). LLM invocation threshold of 35 validated against 2,874 of those historical jobs with zero false negatives.
Test coverage: 592 tests passing across the monorepo (vitest, 49 files). Backward compatible: legacy JobMatcher still works if setJScoreV2() is not called; computeHeuristicScore() still exported for tests.
Deployed: 2026-03-26. Integrated into the unified pipeline (DISCOVER -> MATCH -> TAILOR -> SUBMIT -> TRACK) where every cycle runs all 5 stages.
Usage Notes
- Embedder warmup: Call
warmupEmbedder()at server startup to avoid cold-start latency on the first job. The model is ~22MB and takes 1-2s to load. - Cost gate is the key design lever. The LLM threshold (default 35) controls the cost/accuracy tradeoff. Raising it saves money but risks false negatives on jobs where the LLM would have scored high despite low structured overlap. The current threshold was empirically validated.
- Confidence adjustment prevents LLM hallucination from dominating. When the LLM returns
lowconfidence (sparse job description, unclear requirements), its 0.40 weight drops to an effective 0.24 after renormalization. This is critical: without it, a hallucinated high score from a vague job posting could push a bad match over the 70 threshold. - Skill gaps flow end-to-end.
structuredScore.skillGaps[]feeds into the LLM prompt (so it can reason about whether gaps are dealbreakers) and into the finalconcerns[]output (so the tailoring stage knows what to address in the cover letter). - Score audit enables closed-loop optimization. The
score_audittable withoutcometracking (viaupdateOutcome()) andgetScoreDistribution()provides the data for future score calibration: correlating J-Score predictions with actual interview/rejection outcomes from Gmail scanning. - Batch mode for discovery phases.
batchScore()runs Layers 1+2 for all jobs first, then batches Layer 3 calls in chunks of 5 for above-threshold jobs. This is more efficient than scoring one job at a time during the DISCOVER phase.
Key files: packages/ai/src/scoring/score-fusion.ts (orchestrator + fusion), semantic-embedder.ts (Layer 1), structured-scorer.ts (Layer 2), enhanced-llm-scorer.ts (Layer 3), skills-taxonomy.ts (200+ skills), packages/database/src/repositories/score-audit.ts (audit trail), packages/engine/src/pipeline/match.ts (pipeline integration + floor enforcement).
Quality Checks
- All 3 layers fire for scores above LLM threshold.
score_auditrow has non-nullsemantic,structured,llmfor every row withstructured >= 35. - Composite score in [0, 100].
SELECT MIN(score), MAX(score) FROM score_audit WHERE created > todayreturns values in range. - Every invocation writes score_audit. Including rejections (score < 70).
SELECT COUNT(*)matches upstream invocation count. - LLM confidence weighting applied. Low-confidence LLM results show effective weight ~0.24 in audit breakdown; high ~0.48.
- 5-layer floor enforcement. Config = 70, MATCH_SCORE_FLOOR = 70, auto-approve checks score, submit guard rejects, junk filter blocks non-jobs.
grep -r 'minMatchScore\|MATCH_SCORE_FLOOR' packages/shows all 5. - Batch mode layering.
batchScore()runs L1+L2 for all jobs first, then L3 only for above-threshold; no mixed ordering.
Visual Enrichment
| Medium | Type | Description |
|---|---|---|
| R | DST histogram | Score distribution |
| R | COR scatter+trend | Semantic vs structured vs composite |
| Figma | Flowchart | 3-layer fusion: semantic -> structured -> LLM -> composite |
Self-Improvement Cross-Reference
Pattern 3 (Metric Ratchet): scoring weights were tuned via karpathy-ratchet iterations. For the master reference on all 6 self-improvement patterns, see skills/self-improving-agent-patterns.