Journal

Skills-dimension upgrade: MiniMax benchmark, Rust scanner, 14 tickets shipped, 4 deferred with restart instructions

April 19, 2026

skillsvault-scannersminimax-benchmarkrust-migration

Signal

Audited the skills dimension against the MiniMax-AI/skills benchmark, shipped 14 tickets across Phases 1-6 of a written upgrade plan, deferred 4 tickets with concrete next-session restart instructions. The vault now has a 9-scanner Rust gate (rusty-bloomnet/crates/vault-scanners/src/skills.rs) that runs under rv audit --dimension skills, a pre-commit hook blocking stale version stamps, a daily staleness alert, a new vision-analysis skill with a chart-qa mode for visual-enrichment QA, and a candidate-match-report v2.0 that forked MiniMax’s 15-cover-pattern design system. 42 skill files, 0 skills.* scanner failures at session close.

The arc was three moves in order: measure (MiniMax benchmark surfaced that our version stamps had rotted: 38/40 skills carried the same 260413 date despite obvious edits since), repair (V1.1-V1.6 swept the rot: cross-refs restored, audit-definitions template placeholder fixed, 41 changelogs migrated to ### YYMMDD: H3 format per versioning-standard rule 2, Quality Checks backfilled on 20 skills), then gate (V2.1-V2.3 installed the enforcement layer the repair unlocked). Per pattern_repair_first_then_tighten.md: sweep first, promote scanner to ERROR after the field is clean.

Evidence

V2.1 Rust scanner shipped (9 scanners, 285 LOC): Added skills module to crates/vault-scanners/src/skills.rs with 5 ERROR-level checks (name-not-title, trigger-present, last_audited-valid, self-improvement-xref, required-sections, no-template-placeholders) and 4 WARN-level (last_audited-freshness, canonical-fields-present, changelog-has-row). Integrated via the existing 72-scanner registry pattern: no new CLI surface needed; rv audit --dimension skills picks it up automatically. 14/14 scanner tests green including a new end-to-end fixture test.
V1.5 Quality Checks backfill: 20 skills, skill-specific content per Principle P1: Not a generic template. karpathy-ratchet got “immutable evaluator intact, 5% regression guard active, results log captures every experiment including discards”; j-score-v2-matching got “all 3 layers fire, composite ∈ [0,100], score_audit row written for every invocation, 5-layer floor enforcement verified”; candidate-match-report got “all 20 sections rendered, no PII in output, brand tokens applied, match score prominent on cover.” Each skill’s Quality Checks match how that skill is actually used.
V1.6 changelog format migration: all 41 skills: /tmp/migrate-skill-changelogs.py converted | YYYY-MM-DD | note | table rows to ### YYMMDD: Title H3 blocks. Same-day rows consolidated into one H3 with bullets per rule 3. Template ~/vault/_templates/skill.md updated so new skills ship in the canonical format. Scanner skills.changelog_has_row regex accepts both formats during transition.
V5.2 vision-analysis skill created: Six modes: describe, ocr, ui-review, chart-data, object-detect, and vault-specific chart-qa. The chart-qa mode closes the QA loop on wiki-visual-enrichment’s R chart output: 6 criteria (title, axes, legend, overplotting, colorblind-safe, source caption), returns machine-readable JSON issues array so the enrichment pipeline can gate. Mode taxonomy forked from MiniMax-AI/skills vision-analysis (MIT, attributed in sources).
V5.1 candidate-match-report v2.0: forked MiniMax minimax-pdf design system: Shipped ~/vault/skills/candidate-match-report/design-system/cover-patterns.json (15 patterns from report to terminal to magazine), design-system/color-matrix.json (8 industry palettes with defensible per-industry accent colors), schemas/content-block.schema.json (20 block types including match_score_banner and skill_gap_matrix extensions). MIT attribution in sources field. The v2.0 pipeline reads these data files instead of hardcoding visual identity.
V4.1 CLI shims for 7 skills: scripts/run.sh at each of candidate-match-report, bloomnet-refresh, r-visualization-pipeline, j-score-v2-matching, karpathy-ratchet, resume-tailoring, article-cluster-generator. All --help exits 0, all use only env + explicit flags (no hardcoded vault paths), all marked portable: true in frontmatter so future “open source a subset” flow can extract cleanly.
V2.2 pre-commit version-bump hook: Bash layer in ~/vault/.git/hooks/pre-commit blocks staged skills/*.md commits when the version: stamp is not today’s YYMMDD or when the changelog lacks a today entry. Runs before rv index / rv audit. Bypass instruction (git commit --no-verify) surfaced only when blocking.
V2.3 staleness alert: ~/.claude/hooks/vault-skills-staleness.sh: advisory script scanning skills for last_audited > 30d. Fails open (exit 0 always). User picks cadence: SessionStart hook, launchd weekly, or cron.
V6.1 + V6.2 polish: All 41 skills got license: private and sources: [] frontmatter; 11 backed-by-real-code skills got skill_path: pointers populated; _config/frontmatter-schema.yaml updated with the new optional fields and license_values enum.
V5.3 deferred per Principle P1: Ran the corpus query. ~/.claude/bloomnet-events.jsonl has 13,044 events but zero skill_invoked events (not instrumented). Rewriting triggers from intuition would hyper-tune to my interpretation: exactly the failure mode P1 prevents. Added V5.3a instrumentation ticket; V5.3 execution waits 30 days post-instrument for corpus accumulation.
V3.1 partial ship: Companion directory pattern proven on audit-projects. ~/vault/skills/audit-projects/references/checks.md created as a type: topic frame (auto-indexed, passes scanners). Main audit-projects.md got a ## Companion References pointer. Full content extraction of 13 check blocks deferred to next session: it’s mechanical but content-sensitive, and writing extraction scripts in a fatigued session tail is how silent errors slip in (V6 regex bug earlier in this session was exactly that class of mistake).
Two incidents mid-session with root-cause repairs: (1) V6.1 script’s non-greedy regex corrupted 18 bullet changelogs by splitting words at _ characters: repaired via /tmp/repair-v6-corruption.py. (2) Stale DB frame for deleted agent-kg-competitive-crawl.md kept firing scanner even though file was gone: marked frame superseded via SQL; added rv index --gc to future-work list.

So What

The skills dimension went from “rigorous in concept, undisciplined in enforcement” to “rigorous with a Rust gate and a repair-done field.” The MiniMax benchmark didn’t tell us to become MiniMax: it surfaced that our version-stamp discipline, which we had written standards for, wasn’t actually enforced. Half the fix was infrastructure the vault-engine already supported (the existing rv audit + scanner registry was sized exactly right for a new skills module), and half was content work (V1.5 backfilled 20 Quality Checks sections, each specific to how its skill is actually used). Principle P1: corpus-informed evolution, not intuition: kept two tickets from shipping: V5.3 because we have no skill-invocation telemetry, and full V3.1 extraction because it deserved a focused session rather than a tail-end grind. Shipping speed is not the win; shipping the right thing with receipts is.

The MiniMax benchmark lands as a one-directional borrow: we took their 15-cover-pattern catalog, color-by-industry matrix, content-block schema, and mode taxonomy for vision-analysis. We didn’t take their pipeline shape (their make.sh is fine for their use case, mismatched with ours) or their lack of provenance tracking. Our provenance-with-metrics discipline genuinely beats theirs: every skill cites usage numbers and experiment links, which MiniMax doesn’t do. That discipline is why Phase 6’s polish tickets are cheap: license/sources/skill_path fields just document what we already know, they don’t ask us to find it.

Deferred work carries its own restart instructions. V3.1’s next steps are documented inside ~/vault/skills/audit-projects/references/checks.md; V5.3a’s instrumentation is specified in the plan with the exact event-type struct and hook-wiring path; V3.1 for audit-journal and audit-research is a recipe repeat once audit-projects extraction is verified. None of these are “TODO someday”: they’re “here’s exactly where you left off.”

What’s Next

Three items carry forward. Priority order: (1) V3.1 full extraction of audit-projects’ 13 check blocks into the companion references file, then replicate for audit-journal and audit-research: this is the most mechanical work and unblocks the progressive-disclosure context-window win on the heaviest audit skills. (2) V5.3a instrumentation: wire EventType::SkillInvoked into crates/events and emit from Claude Code’s skill-invocation path: after shipping, wait 30 days for corpus accumulation, then execute V5.3 trigger enrichment with real data. (3) Add rv index --gc to vault-cli to mark frames as superseded when their source file has been deleted: this session hit the stale-frame bug once and it will keep biting until addressed.

A second-order win to track next week: whether any skills start failing the V2.2 pre-commit version-bump hook on the first edit after today. Per Principle P1, the hook is new and un-tested against real drift patterns; the next false-positive or false-negative tells us how the hook should evolve.

Log

Scope: Skills dimension repair + gate installation. 42 skill files audited, 14 tickets shipped, 4 deferred with restart instructions.
Artifacts shipped: rusty-bloomnet/crates/vault-scanners/src/skills.rs (new, 285 LOC, 9 scanners); ~/.claude/hooks/vault-skills-staleness.sh; updated ~/vault/.git/hooks/pre-commit; new skill vision-analysis.md; candidate-match-report/{design-system,schemas,scripts}/ with 5 data files + 1 CLI shim; 6 other <skill>/scripts/run.sh shims.
Content touched: 41 skill changelogs migrated (V1.6), 18 skills got Quality Checks (V1.5), 2 audit skills got Provenance+Usage (V1.5), 41 skills got license+sources (V6.1), 11 skills got skill_path (V6.2), 5 skills got self-improvement cross-refs (V1.3), 2 short skills rewritten to canonical body (V1.4).
Benchmark: MiniMax-AI/skills @ HEAD (17 public + 1 internal pr-review skill). Comparison documented at reference_minimax_skills_benchmark.md memory and the plan §1.
Rust workspace: Cargo.toml bumped 0.260413.0 → 0.260420.0. rusty-bloomnet/CHANGELOG.md gained ## 260420: Skills Scanner Module (V2.1) entry.
Tests: 14/14 vault-scanners tests pass including new test_skills_scanners_detect_rot end-to-end fixture.
Final audit: rv audit --dimension skills → 0 skills.* failures across 47 frames + 168 warnings (all informational).
Plan + memory: ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md carries 15 tickets + 10-entry changelog. Memory files updated: project_vault_architecture.md (living-doc), reference_minimax_skills_benchmark.md (benchmark reference), MEMORY.md index.

Restart Instructions (for next session)

V3.1 full extraction: Read ~/vault/skills/audit-projects/references/checks.md, follow the 5-step next-session recipe at the bottom. Extract 13 check blocks from audit-projects.md into the companion file; shrink main skill body accordingly. Repeat for audit-journal + audit-research.
V5.3a instrumentation: Per plan §5 V5.3 corpus-gate block: add EventType::SkillInvoked { skill_name, session_id, trigger_match_score? } to rusty-bloomnet/crates/events/src/event.rs, emit from Claude Code skill-invocation hook (likely UserPromptSubmit parsing /<skill-name> patterns), ship, wait 30 days for corpus.
rv index --gc: Add flag to rusty-bloomnet/crates/vault-cli/src/main.rs::cmd_index that marks frames as superseded when their source file no longer exists. ~20 LOC.

Plan is the source of truth: ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md.