Self-improving toolkit plugin: 3-axis taxonomy + 31 seed cases in one session
->
A full Claude plugin for self-improving agent patterns was scaffolded from scratch in a single session. The plugin encodes a 3-axis taxonomy (iteration-objective, anti-pattern, catastrophe-prevention), 8 iteration objectives with the Huang Constraint, 31 seed cases drawn from real incidents across all repos, and 4 skills plus an orchestrator agent.
What Happened
The problem: agent improvement patterns were scattered across vault pitfalls, breakthrough entries, and ad-hoc session notes. The karpathy-ratchet skill knew about metric optimization. The ralph-loop skill knew about dashboard QA. Neither knew about the other, and neither could classify a new improvement task into the right pattern.
The plugin unifies this into a structured taxonomy along three axes:
-
Iteration objectives (IO-1 through IO-8). Each objective defines what “better” means for a class of agent task. IO-1 is dashboard QA (ralph-loop), IO-2 is metric optimization (karpathy-ratchet), IO-3 through IO-8 cover repair sweeps, schema migration, prompt refinement, benchmark calibration, coverage expansion, and architecture search respectively.
-
Anti-patterns. 12 cataloged failure modes that recur when agents try to self-improve: gate softening, synthetic data substitution, artifact back-filling, fix-plan-as-fix, downstream workarounds, and others. Each anti-pattern is cross-referenced to the iteration objective where it most commonly occurs.
-
Catastrophe prevention. 7 guard conditions that must hold before, during, and after any self-improvement loop. These map directly to the Stella quality hooks: catastrophic-guard categories, depth limits, IO ratio bounds.
The 31 seed cases are not synthetic examples. Each one is a real incident from the vault: the SKL placeholder-midnight reversal (gate softening), the bloomnet.db synthetic data incident (synthetic substitution), the 10-gap fix-plan incident (artifact back-filling). Every case is tagged with its iteration objective, the anti-pattern it triggered, and which catastrophe guard would have caught it.
The 4 skills: classify (given a task description, return the iteration objective + recommended pattern), guard (given a proposed change, check against anti-patterns and catastrophe conditions), retrospect (given a completed session, extract a new seed case if the outcome was novel), and orchestrate (chain the other three into a self-improvement loop).
Why It Matters
This is the first time the vault’s incident knowledge has been made executable. Previously, a new session had to re-derive “don’t soften the gate” from reading MEMORY.md entries. Now the plugin’s guard skill checks proposed changes against the full anti-pattern catalog automatically. The seed cases also serve as a regression suite: any change to the taxonomy must still correctly classify all 31 historical incidents.
The Huang Constraint (no iteration objective may improve its target metric by degrading a metric owned by another objective) prevents the most common inter-objective conflict: metric optimization (IO-2) weakening gates that dashboard QA (IO-1) depends on.
Evidence
- Plugin directory:
~/.claude/plugins/self-improving-toolkit/ - 8 iteration objectives defined with formal Huang Constraint
- 31 seed cases, each traceable to a vault pitfall or breakthrough entry
- 4 skills: classify, guard, retrospect, orchestrate
- Cross-references to projects/dakka/_index for orchestrator integration