Journal

Three new repos, iteration objective taxonomy, and the event-driven A/B system ships

ai-agentsmldata-engops

Signal

The biggest shipping day of April by commit count. Three new repos materialized: self-improving-toolkit (full plugin with 3-axis taxonomy, 8 iteration objectives, 31 seed cases, 4 skills), agent-mqi (standalone MQI dashboard extracted for public consumption), and a 90-commit day across 8 repos. In parallel, the jobs-apply event-driven A/B system went from design spec to live with 8 experiments seeded, and a missing React hook that broke Vercel builds for 24+ hours got found and fixed.

Evidence

  • self-improving-toolkit (19 commits): Full plugin scaffolded from scratch. The 3-axis taxonomy master frame classifies self-improvement patterns along iteration-objective, anti-pattern, and catastrophe-prevention axes. 8 iteration objectives defined with the Huang Constraint (the bound on self-modification that prevents reward hacking). 6 anti-pattern classes, 5 catastrophe prevention classes, 3 eval harness archetypes. 31 seed cases pulled from real incidents across all repos. 4 skills shipped: invocation-logger, self-healing-audit, experiment-design, eval-harness-audit. Orchestrator wired up. CLAUDE.md written.
  • vault (16 commits): The iteration-objective taxonomy that the toolkit operationalizes. 8 IO definitions with meta-definition. IO axis added to self-improving-agent-patterns topic and skill. Karpathy-ratchet and ralph-loop tagged with their IO classifications. IO taxonomy heatmap generated. 3 research papers ingested (Reflexion, DGM, Huang). Huang constraint formalized as a pitfall. 3 eval harness archetype definitions, 19-benchmark catalog, 3 deep-dives, and an IO x eval harness compatibility matrix. LLM-as-judge epistemic biases pitfall written.
  • jobs-apply (24 commits): Event-driven A/B system complete from spec to running code. ExperimentQueueRepository, DB helpers (increment/reset/conclude/threshold), seed script for 8 experiments, promotion engine with Bayesian endpoint and auto-promotion, cron safety net, /api/internal/ab-health observability route. Also fixed a missing use-engine-status hook that silently broke Vercel builds. That build was down for 24+ hours before diagnosis; the fix was a one-line re-export.
  • agent-mqi (18 commits): New repo created. Standalone MQI dashboard extracted from rusty-bloomnet, thought leadership blog post written, viral README redesign with three-act structure and dashboard thumbnails.
  • public-lab (8 commits): agent-mqi content hub page with 14 dashboard screenshots, hero stories section. Self-improving systems toolkit spec and plan (17 tasks). YAML/schema repair.
  • rusty-dakka (2 commits): Playwright e2e test suite and avatar dev scripts.
  • rusty-bloomnet (3 commits): JSON API status parser, Opus 4.7 incident mapping, toISODate fix.

So What

The through-line is iteration-objective taxonomy crystallizing into tools. The vault IO taxonomy is the theory (what kinds of self-improvement exist, what bounds them, what goes wrong). The self-improving-toolkit plugin is the practice (given a real codebase, classify its improvement patterns, detect anti-patterns, design experiments). The 31 seed cases bridging theory to practice are all pulled from real incidents in this ecosystem, not synthetic examples. That matters because the Huang Constraint pitfall was written from a near-miss in the SKL audit pipeline where a gate was weakened to pass a metric, and the catastrophe prevention classes include “artifact back-filling” which cost two days in the April 17-18 gap incident.

The jobs-apply A/B system landing in the same day is parallel execution, not the same arc. But the 24-hour Vercel build break is the kind of thing that validates the /api/internal/ab-health observability route: if a one-line missing hook can silently break production for a full day, you need health checks that surface failures faster than manual discovery. The event-driven design means experiments promote themselves when threshold is hit, rather than requiring a manual review pass.

What’s Next

Self-improving-toolkit needs real usage data before the trigger refinement pass (same V5.3 corpus-gate pattern from the skills audit). Agent-mqi needs public deployment to validate whether the dashboard resonates outside my own workflow. The 8 A/B experiments are seeded and running; first auto-promotion should fire within a week if traffic holds. Four repos discovered today are not yet tracked in vault memory: self-improving-toolkit, competitive-audit-report, agent-mqi, skl-engine.

Log

  • Scope: Three new repos, IO taxonomy, event-driven A/B system, Vercel build fix
  • Artifacts shipped: self-improving-toolkit plugin (3-axis taxonomy, 8 IOs, 31 seed cases, 4 skills, orchestrator), jobs-apply A/B system (ExperimentQueueRepository, promotion engine, Bayesian endpoint, 8 experiments seeded, cron safety net, ab-health route), agent-mqi standalone dashboard + blog post + README, vault IO taxonomy (8 IOs, meta-definition, heatmap, 3 research papers, 19-benchmark catalog, compatibility matrix), public-lab content hub + toolkit spec