Journal

Skills-dimension upgrade: MiniMax benchmark, Rust scanner, 14 tickets shipped, 4 deferred with restart instructions

skillsvault-scannersminimax-benchmarkrust-migration

Signal

Audited the skills dimension against the MiniMax-AI/skills benchmark, shipped 14 tickets across Phases 1-6 of a written upgrade plan, deferred 4 tickets with concrete next-session restart instructions. The vault now has a 9-scanner Rust gate (rusty-bloomnet/crates/vault-scanners/src/skills.rs) that runs under rv audit --dimension skills, a pre-commit hook blocking stale version stamps, a daily staleness alert, a new vision-analysis skill with a chart-qa mode for visual-enrichment QA, and a candidate-match-report v2.0 that forked MiniMax’s 15-cover-pattern design system. 42 skill files, 0 skills.* scanner failures at session close.

The arc was three moves in order: measure (MiniMax benchmark surfaced that our version stamps had rotted: 38/40 skills carried the same 260413 date despite obvious edits since), repair (V1.1-V1.6 swept the rot: cross-refs restored, audit-definitions template placeholder fixed, 41 changelogs migrated to ### YYMMDD: H3 format per versioning-standard rule 2, Quality Checks backfilled on 20 skills), then gate (V2.1-V2.3 installed the enforcement layer the repair unlocked). Per pattern_repair_first_then_tighten.md: sweep first, promote scanner to ERROR after the field is clean.

Evidence

  • V2.1 Rust scanner shipped (9 scanners, 285 LOC): Added skills module to crates/vault-scanners/src/skills.rs with 5 ERROR-level checks (name-not-title, trigger-present, last_audited-valid, self-improvement-xref, required-sections, no-template-placeholders) and 4 WARN-level (last_audited-freshness, canonical-fields-present, changelog-has-row). Integrated via the existing 72-scanner registry pattern: no new CLI surface needed; rv audit --dimension skills picks it up automatically. 14/14 scanner tests green including a new end-to-end fixture test.
  • V1.5 Quality Checks backfill: 20 skills, skill-specific content per Principle P1: Not a generic template. karpathy-ratchet got “immutable evaluator intact, 5% regression guard active, results log captures every experiment including discards”; j-score-v2-matching got “all 3 layers fire, composite ∈ [0,100], score_audit row written for every invocation, 5-layer floor enforcement verified”; candidate-match-report got “all 20 sections rendered, no PII in output, brand tokens applied, match score prominent on cover.” Each skill’s Quality Checks match how that skill is actually used.
  • V1.6 changelog format migration: all 41 skills: /tmp/migrate-skill-changelogs.py converted | YYYY-MM-DD | note | table rows to ### YYMMDD: Title H3 blocks. Same-day rows consolidated into one H3 with bullets per rule 3. Template ~/vault/_templates/skill.md updated so new skills ship in the canonical format. Scanner skills.changelog_has_row regex accepts both formats during transition.
  • V5.2 vision-analysis skill created: Six modes: describe, ocr, ui-review, chart-data, object-detect, and vault-specific chart-qa. The chart-qa mode closes the QA loop on wiki-visual-enrichment’s R chart output: 6 criteria (title, axes, legend, overplotting, colorblind-safe, source caption), returns machine-readable JSON issues array so the enrichment pipeline can gate. Mode taxonomy forked from MiniMax-AI/skills vision-analysis (MIT, attributed in sources).
  • V5.1 candidate-match-report v2.0: forked MiniMax minimax-pdf design system: Shipped ~/vault/skills/candidate-match-report/design-system/cover-patterns.json (15 patterns from report to terminal to magazine), design-system/color-matrix.json (8 industry palettes with defensible per-industry accent colors), schemas/content-block.schema.json (20 block types including match_score_banner and skill_gap_matrix extensions). MIT attribution in sources field. The v2.0 pipeline reads these data files instead of hardcoding visual identity.
  • V4.1 CLI shims for 7 skills: scripts/run.sh at each of candidate-match-report, bloomnet-refresh, r-visualization-pipeline, j-score-v2-matching, karpathy-ratchet, resume-tailoring, article-cluster-generator. All --help exits 0, all use only env + explicit flags (no hardcoded vault paths), all marked portable: true in frontmatter so future “open source a subset” flow can extract cleanly.
  • V2.2 pre-commit version-bump hook: Bash layer in ~/vault/.git/hooks/pre-commit blocks staged skills/*.md commits when the version: stamp is not today’s YYMMDD or when the changelog lacks a today entry. Runs before rv index / rv audit. Bypass instruction (git commit --no-verify) surfaced only when blocking.
  • V2.3 staleness alert: ~/.claude/hooks/vault-skills-staleness.sh: advisory script scanning skills for last_audited > 30d. Fails open (exit 0 always). User picks cadence: SessionStart hook, launchd weekly, or cron.
  • V6.1 + V6.2 polish: All 41 skills got license: private and sources: [] frontmatter; 11 backed-by-real-code skills got skill_path: pointers populated; _config/frontmatter-schema.yaml updated with the new optional fields and license_values enum.
  • V5.3 deferred per Principle P1: Ran the corpus query. ~/.claude/bloomnet-events.jsonl has 13,044 events but zero skill_invoked events (not instrumented). Rewriting triggers from intuition would hyper-tune to my interpretation: exactly the failure mode P1 prevents. Added V5.3a instrumentation ticket; V5.3 execution waits 30 days post-instrument for corpus accumulation.
  • V3.1 partial ship: Companion directory pattern proven on audit-projects. ~/vault/skills/audit-projects/references/checks.md created as a type: topic frame (auto-indexed, passes scanners). Main audit-projects.md got a ## Companion References pointer. Full content extraction of 13 check blocks deferred to next session: it’s mechanical but content-sensitive, and writing extraction scripts in a fatigued session tail is how silent errors slip in (V6 regex bug earlier in this session was exactly that class of mistake).
  • Two incidents mid-session with root-cause repairs: (1) V6.1 script’s non-greedy regex corrupted 18 bullet changelogs by splitting words at _ characters: repaired via /tmp/repair-v6-corruption.py. (2) Stale DB frame for deleted agent-kg-competitive-crawl.md kept firing scanner even though file was gone: marked frame superseded via SQL; added rv index --gc to future-work list.

So What

The skills dimension went from “rigorous in concept, undisciplined in enforcement” to “rigorous with a Rust gate and a repair-done field.” The MiniMax benchmark didn’t tell us to become MiniMax: it surfaced that our version-stamp discipline, which we had written standards for, wasn’t actually enforced. Half the fix was infrastructure the vault-engine already supported (the existing rv audit + scanner registry was sized exactly right for a new skills module), and half was content work (V1.5 backfilled 20 Quality Checks sections, each specific to how its skill is actually used). Principle P1: corpus-informed evolution, not intuition: kept two tickets from shipping: V5.3 because we have no skill-invocation telemetry, and full V3.1 extraction because it deserved a focused session rather than a tail-end grind. Shipping speed is not the win; shipping the right thing with receipts is.

The MiniMax benchmark lands as a one-directional borrow: we took their 15-cover-pattern catalog, color-by-industry matrix, content-block schema, and mode taxonomy for vision-analysis. We didn’t take their pipeline shape (their make.sh is fine for their use case, mismatched with ours) or their lack of provenance tracking. Our provenance-with-metrics discipline genuinely beats theirs: every skill cites usage numbers and experiment links, which MiniMax doesn’t do. That discipline is why Phase 6’s polish tickets are cheap: license/sources/skill_path fields just document what we already know, they don’t ask us to find it.

Deferred work carries its own restart instructions. V3.1’s next steps are documented inside ~/vault/skills/audit-projects/references/checks.md; V5.3a’s instrumentation is specified in the plan with the exact event-type struct and hook-wiring path; V3.1 for audit-journal and audit-research is a recipe repeat once audit-projects extraction is verified. None of these are “TODO someday”: they’re “here’s exactly where you left off.”

What’s Next

Three items carry forward. Priority order: (1) V3.1 full extraction of audit-projects’ 13 check blocks into the companion references file, then replicate for audit-journal and audit-research: this is the most mechanical work and unblocks the progressive-disclosure context-window win on the heaviest audit skills. (2) V5.3a instrumentation: wire EventType::SkillInvoked into crates/events and emit from Claude Code’s skill-invocation path: after shipping, wait 30 days for corpus accumulation, then execute V5.3 trigger enrichment with real data. (3) Add rv index --gc to vault-cli to mark frames as superseded when their source file has been deleted: this session hit the stale-frame bug once and it will keep biting until addressed.

A second-order win to track next week: whether any skills start failing the V2.2 pre-commit version-bump hook on the first edit after today. Per Principle P1, the hook is new and un-tested against real drift patterns; the next false-positive or false-negative tells us how the hook should evolve.

Log

  • Scope: Skills dimension repair + gate installation. 42 skill files audited, 14 tickets shipped, 4 deferred with restart instructions.
  • Artifacts shipped: rusty-bloomnet/crates/vault-scanners/src/skills.rs (new, 285 LOC, 9 scanners); ~/.claude/hooks/vault-skills-staleness.sh; updated ~/vault/.git/hooks/pre-commit; new skill vision-analysis.md; candidate-match-report/{design-system,schemas,scripts}/ with 5 data files + 1 CLI shim; 6 other <skill>/scripts/run.sh shims.
  • Content touched: 41 skill changelogs migrated (V1.6), 18 skills got Quality Checks (V1.5), 2 audit skills got Provenance+Usage (V1.5), 41 skills got license+sources (V6.1), 11 skills got skill_path (V6.2), 5 skills got self-improvement cross-refs (V1.3), 2 short skills rewritten to canonical body (V1.4).
  • Benchmark: MiniMax-AI/skills @ HEAD (17 public + 1 internal pr-review skill). Comparison documented at reference_minimax_skills_benchmark.md memory and the plan §1.
  • Rust workspace: Cargo.toml bumped 0.260413.00.260420.0. rusty-bloomnet/CHANGELOG.md gained ## 260420: Skills Scanner Module (V2.1) entry.
  • Tests: 14/14 vault-scanners tests pass including new test_skills_scanners_detect_rot end-to-end fixture.
  • Final audit: rv audit --dimension skills → 0 skills.* failures across 47 frames + 168 warnings (all informational).
  • Plan + memory: ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md carries 15 tickets + 10-entry changelog. Memory files updated: project_vault_architecture.md (living-doc), reference_minimax_skills_benchmark.md (benchmark reference), MEMORY.md index.

Restart Instructions (for next session)

  1. V3.1 full extraction: Read ~/vault/skills/audit-projects/references/checks.md, follow the 5-step next-session recipe at the bottom. Extract 13 check blocks from audit-projects.md into the companion file; shrink main skill body accordingly. Repeat for audit-journal + audit-research.
  2. V5.3a instrumentation: Per plan §5 V5.3 corpus-gate block: add EventType::SkillInvoked { skill_name, session_id, trigger_match_score? } to rusty-bloomnet/crates/events/src/event.rs, emit from Claude Code skill-invocation hook (likely UserPromptSubmit parsing /<skill-name> patterns), ship, wait 30 days for corpus.
  3. rv index --gc: Add flag to rusty-bloomnet/crates/vault-cli/src/main.rs::cmd_index that marks frames as superseded when their source file no longer exists. ~20 LOC.

Plan is the source of truth: ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md.