Skills-dimension upgrade: MiniMax benchmark, Rust scanner, 14 tickets shipped, 4 deferred with restart instructions
Signal
Audited the skills dimension against the MiniMax-AI/skills benchmark, shipped 14 tickets across Phases 1-6 of a written upgrade plan, deferred 4 tickets with concrete next-session restart instructions. The vault now has a 9-scanner Rust gate (rusty-bloomnet/crates/vault-scanners/src/skills.rs) that runs under rv audit --dimension skills, a pre-commit hook blocking stale version stamps, a daily staleness alert, a new vision-analysis skill with a chart-qa mode for visual-enrichment QA, and a candidate-match-report v2.0 that forked MiniMax’s 15-cover-pattern design system. 42 skill files, 0 skills.* scanner failures at session close.
The arc was three moves in order: measure (MiniMax benchmark surfaced that our version stamps had rotted: 38/40 skills carried the same 260413 date despite obvious edits since), repair (V1.1-V1.6 swept the rot: cross-refs restored, audit-definitions template placeholder fixed, 41 changelogs migrated to ### YYMMDD: H3 format per versioning-standard rule 2, Quality Checks backfilled on 20 skills), then gate (V2.1-V2.3 installed the enforcement layer the repair unlocked). Per pattern_repair_first_then_tighten.md: sweep first, promote scanner to ERROR after the field is clean.
Evidence
- V2.1 Rust scanner shipped (9 scanners, 285 LOC): Added
skillsmodule tocrates/vault-scanners/src/skills.rswith 5 ERROR-level checks (name-not-title, trigger-present, last_audited-valid, self-improvement-xref, required-sections, no-template-placeholders) and 4 WARN-level (last_audited-freshness, canonical-fields-present, changelog-has-row). Integrated via the existing 72-scanner registry pattern: no new CLI surface needed;rv audit --dimension skillspicks it up automatically. 14/14 scanner tests green including a new end-to-end fixture test. - V1.5 Quality Checks backfill: 20 skills, skill-specific content per Principle P1: Not a generic template. karpathy-ratchet got “immutable evaluator intact, 5% regression guard active, results log captures every experiment including discards”; j-score-v2-matching got “all 3 layers fire, composite ∈ [0,100], score_audit row written for every invocation, 5-layer floor enforcement verified”; candidate-match-report got “all 20 sections rendered, no PII in output, brand tokens applied, match score prominent on cover.” Each skill’s Quality Checks match how that skill is actually used.
- V1.6 changelog format migration: all 41 skills:
/tmp/migrate-skill-changelogs.pyconverted| YYYY-MM-DD | note |table rows to### YYMMDD: TitleH3 blocks. Same-day rows consolidated into one H3 with bullets per rule 3. Template~/vault/_templates/skill.mdupdated so new skills ship in the canonical format. Scannerskills.changelog_has_rowregex accepts both formats during transition. - V5.2 vision-analysis skill created: Six modes: describe, ocr, ui-review, chart-data, object-detect, and vault-specific chart-qa. The chart-qa mode closes the QA loop on
wiki-visual-enrichment’s R chart output: 6 criteria (title, axes, legend, overplotting, colorblind-safe, source caption), returns machine-readable JSON issues array so the enrichment pipeline can gate. Mode taxonomy forked from MiniMax-AI/skills vision-analysis (MIT, attributed in sources). - V5.1 candidate-match-report v2.0: forked MiniMax minimax-pdf design system: Shipped
~/vault/skills/candidate-match-report/design-system/cover-patterns.json(15 patterns from report to terminal to magazine),design-system/color-matrix.json(8 industry palettes with defensible per-industry accent colors),schemas/content-block.schema.json(20 block types including match_score_banner and skill_gap_matrix extensions). MIT attribution in sources field. The v2.0 pipeline reads these data files instead of hardcoding visual identity. - V4.1 CLI shims for 7 skills:
scripts/run.shat each of candidate-match-report, bloomnet-refresh, r-visualization-pipeline, j-score-v2-matching, karpathy-ratchet, resume-tailoring, article-cluster-generator. All--helpexits 0, all use only env + explicit flags (no hardcoded vault paths), all markedportable: truein frontmatter so future “open source a subset” flow can extract cleanly. - V2.2 pre-commit version-bump hook: Bash layer in
~/vault/.git/hooks/pre-commitblocks stagedskills/*.mdcommits when theversion:stamp is not today’s YYMMDD or when the changelog lacks a today entry. Runs beforerv index/rv audit. Bypass instruction (git commit --no-verify) surfaced only when blocking. - V2.3 staleness alert:
~/.claude/hooks/vault-skills-staleness.sh: advisory script scanning skills forlast_audited > 30d. Fails open (exit 0 always). User picks cadence: SessionStart hook, launchd weekly, or cron. - V6.1 + V6.2 polish: All 41 skills got
license: privateandsources: []frontmatter; 11 backed-by-real-code skills gotskill_path:pointers populated;_config/frontmatter-schema.yamlupdated with the new optional fields andlicense_valuesenum. - V5.3 deferred per Principle P1: Ran the corpus query.
~/.claude/bloomnet-events.jsonlhas 13,044 events but zeroskill_invokedevents (not instrumented). Rewriting triggers from intuition would hyper-tune to my interpretation: exactly the failure mode P1 prevents. Added V5.3a instrumentation ticket; V5.3 execution waits 30 days post-instrument for corpus accumulation. - V3.1 partial ship: Companion directory pattern proven on audit-projects.
~/vault/skills/audit-projects/references/checks.mdcreated as atype: topicframe (auto-indexed, passes scanners). Mainaudit-projects.mdgot a## Companion Referencespointer. Full content extraction of 13 check blocks deferred to next session: it’s mechanical but content-sensitive, and writing extraction scripts in a fatigued session tail is how silent errors slip in (V6 regex bug earlier in this session was exactly that class of mistake). - Two incidents mid-session with root-cause repairs: (1) V6.1 script’s non-greedy regex corrupted 18 bullet changelogs by splitting words at
_characters: repaired via/tmp/repair-v6-corruption.py. (2) Stale DB frame for deletedagent-kg-competitive-crawl.mdkept firing scanner even though file was gone: marked framesupersededvia SQL; addedrv index --gcto future-work list.
So What
The skills dimension went from “rigorous in concept, undisciplined in enforcement” to “rigorous with a Rust gate and a repair-done field.” The MiniMax benchmark didn’t tell us to become MiniMax: it surfaced that our version-stamp discipline, which we had written standards for, wasn’t actually enforced. Half the fix was infrastructure the vault-engine already supported (the existing rv audit + scanner registry was sized exactly right for a new skills module), and half was content work (V1.5 backfilled 20 Quality Checks sections, each specific to how its skill is actually used). Principle P1: corpus-informed evolution, not intuition: kept two tickets from shipping: V5.3 because we have no skill-invocation telemetry, and full V3.1 extraction because it deserved a focused session rather than a tail-end grind. Shipping speed is not the win; shipping the right thing with receipts is.
The MiniMax benchmark lands as a one-directional borrow: we took their 15-cover-pattern catalog, color-by-industry matrix, content-block schema, and mode taxonomy for vision-analysis. We didn’t take their pipeline shape (their make.sh is fine for their use case, mismatched with ours) or their lack of provenance tracking. Our provenance-with-metrics discipline genuinely beats theirs: every skill cites usage numbers and experiment links, which MiniMax doesn’t do. That discipline is why Phase 6’s polish tickets are cheap: license/sources/skill_path fields just document what we already know, they don’t ask us to find it.
Deferred work carries its own restart instructions. V3.1’s next steps are documented inside ~/vault/skills/audit-projects/references/checks.md; V5.3a’s instrumentation is specified in the plan with the exact event-type struct and hook-wiring path; V3.1 for audit-journal and audit-research is a recipe repeat once audit-projects extraction is verified. None of these are “TODO someday”: they’re “here’s exactly where you left off.”
What’s Next
Three items carry forward. Priority order: (1) V3.1 full extraction of audit-projects’ 13 check blocks into the companion references file, then replicate for audit-journal and audit-research: this is the most mechanical work and unblocks the progressive-disclosure context-window win on the heaviest audit skills. (2) V5.3a instrumentation: wire EventType::SkillInvoked into crates/events and emit from Claude Code’s skill-invocation path: after shipping, wait 30 days for corpus accumulation, then execute V5.3 trigger enrichment with real data. (3) Add rv index --gc to vault-cli to mark frames as superseded when their source file has been deleted: this session hit the stale-frame bug once and it will keep biting until addressed.
A second-order win to track next week: whether any skills start failing the V2.2 pre-commit version-bump hook on the first edit after today. Per Principle P1, the hook is new and un-tested against real drift patterns; the next false-positive or false-negative tells us how the hook should evolve.
Log
- Scope: Skills dimension repair + gate installation. 42 skill files audited, 14 tickets shipped, 4 deferred with restart instructions.
- Artifacts shipped:
rusty-bloomnet/crates/vault-scanners/src/skills.rs(new, 285 LOC, 9 scanners);~/.claude/hooks/vault-skills-staleness.sh; updated~/vault/.git/hooks/pre-commit; new skillvision-analysis.md;candidate-match-report/{design-system,schemas,scripts}/with 5 data files + 1 CLI shim; 6 other<skill>/scripts/run.shshims. - Content touched: 41 skill changelogs migrated (V1.6), 18 skills got Quality Checks (V1.5), 2 audit skills got Provenance+Usage (V1.5), 41 skills got license+sources (V6.1), 11 skills got skill_path (V6.2), 5 skills got self-improvement cross-refs (V1.3), 2 short skills rewritten to canonical body (V1.4).
- Benchmark: MiniMax-AI/skills @ HEAD (17 public + 1 internal pr-review skill). Comparison documented at
reference_minimax_skills_benchmark.mdmemory and the plan §1. - Rust workspace: Cargo.toml bumped
0.260413.0→0.260420.0.rusty-bloomnet/CHANGELOG.mdgained## 260420: Skills Scanner Module (V2.1)entry. - Tests: 14/14 vault-scanners tests pass including new
test_skills_scanners_detect_rotend-to-end fixture. - Final audit:
rv audit --dimension skills→ 0 skills.* failures across 47 frames + 168 warnings (all informational). - Plan + memory:
~/vault/plans/2026-04-20-vault-skills-upgrade-plan.mdcarries 15 tickets + 10-entry changelog. Memory files updated:project_vault_architecture.md(living-doc),reference_minimax_skills_benchmark.md(benchmark reference),MEMORY.mdindex.
Restart Instructions (for next session)
- V3.1 full extraction: Read
~/vault/skills/audit-projects/references/checks.md, follow the 5-step next-session recipe at the bottom. Extract 13 check blocks fromaudit-projects.mdinto the companion file; shrink main skill body accordingly. Repeat for audit-journal + audit-research. - V5.3a instrumentation: Per plan §5 V5.3 corpus-gate block: add
EventType::SkillInvoked { skill_name, session_id, trigger_match_score? }torusty-bloomnet/crates/events/src/event.rs, emit from Claude Code skill-invocation hook (likelyUserPromptSubmitparsing/<skill-name>patterns), ship, wait 30 days for corpus. rv index --gc: Add flag torusty-bloomnet/crates/vault-cli/src/main.rs::cmd_indexthat marks frames assupersededwhen their source file no longer exists. ~20 LOC.
Plan is the source of truth: ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md.