Breakthrough Memory bloomnet

Self-improving toolkit plugin: 3-axis taxonomy + 31 seed cases in one session

->

breakthroughai-agentsml
Key Metric
Before
After

A full Claude plugin for self-improving agent patterns was scaffolded from scratch in a single session. The plugin encodes a 3-axis taxonomy (iteration-objective, anti-pattern, catastrophe-prevention), 8 iteration objectives with the Huang Constraint, 31 seed cases drawn from real incidents across all repos, and 4 skills plus an orchestrator agent.

What Happened

The problem: agent improvement patterns were scattered across vault pitfalls, breakthrough entries, and ad-hoc session notes. The karpathy-ratchet skill knew about metric optimization. The ralph-loop skill knew about dashboard QA. Neither knew about the other, and neither could classify a new improvement task into the right pattern.

The plugin unifies this into a structured taxonomy along three axes:

  1. Iteration objectives (IO-1 through IO-8). Each objective defines what “better” means for a class of agent task. IO-1 is dashboard QA (ralph-loop), IO-2 is metric optimization (karpathy-ratchet), IO-3 through IO-8 cover repair sweeps, schema migration, prompt refinement, benchmark calibration, coverage expansion, and architecture search respectively.

  2. Anti-patterns. 12 cataloged failure modes that recur when agents try to self-improve: gate softening, synthetic data substitution, artifact back-filling, fix-plan-as-fix, downstream workarounds, and others. Each anti-pattern is cross-referenced to the iteration objective where it most commonly occurs.

  3. Catastrophe prevention. 7 guard conditions that must hold before, during, and after any self-improvement loop. These map directly to the Stella quality hooks: catastrophic-guard categories, depth limits, IO ratio bounds.

The 31 seed cases are not synthetic examples. Each one is a real incident from the vault: the SKL placeholder-midnight reversal (gate softening), the bloomnet.db synthetic data incident (synthetic substitution), the 10-gap fix-plan incident (artifact back-filling). Every case is tagged with its iteration objective, the anti-pattern it triggered, and which catastrophe guard would have caught it.

The 4 skills: classify (given a task description, return the iteration objective + recommended pattern), guard (given a proposed change, check against anti-patterns and catastrophe conditions), retrospect (given a completed session, extract a new seed case if the outcome was novel), and orchestrate (chain the other three into a self-improvement loop).

Why It Matters

This is the first time the vault’s incident knowledge has been made executable. Previously, a new session had to re-derive “don’t soften the gate” from reading MEMORY.md entries. Now the plugin’s guard skill checks proposed changes against the full anti-pattern catalog automatically. The seed cases also serve as a regression suite: any change to the taxonomy must still correctly classify all 31 historical incidents.

The Huang Constraint (no iteration objective may improve its target metric by degrading a metric owned by another objective) prevents the most common inter-objective conflict: metric optimization (IO-2) weakening gates that dashboard QA (IO-1) depends on.

Evidence

  • Plugin directory: ~/.claude/plugins/self-improving-toolkit/
  • 8 iteration objectives defined with formal Huang Constraint
  • 31 seed cases, each traceable to a vault pitfall or breakthrough entry
  • 4 skills: classify, guard, retrospect, orchestrate
  • Cross-references to projects/dakka/_index for orchestrator integration