Experiment Preferences kiro-cli-factory

A bash loop invoking Claude Code per-spec could autonomously build a multi-component system

ai-agentsautomationarchitecturebreakthrough
Hypothesis

A bash loop invoking Claude Code per-spec could autonomously build a multi-component system

Result: confirmed
Key Findings

Autonomous spec-driven build works but requires single-spec scoping, error tolerance (no set -e), and escape hatches for agent failures. Prompt anti-dependency bias discovered: agents inject unnecessary dependencies when prompts embed content inline.

Changelog

DateSummary
2026-04-06Audited: added Changelog, expanded all sections, stamped last_audited
2026-03-01Initial creation

Hypothesis

We bet that a simple bash loop : invoke claude once per spec file, in sequence : could autonomously build a multi-component CLI system without a human writing the code. The kiro-cli-factory project had 12 spec files covering authentication, project scaffolding, template rendering, config management, and test harnesses. If Claude Code could process each spec independently and produce working code, the total build time would compress from days to hours.

Method

The ralph loop works like this: iterate over every .spec.md file in the specs/ directory, set the RALPH_SPEC environment variable to the current spec path, and invoke claude --dangerously-skip-permissions -p "$(cat program.md)" in the project root. The program.md contains the universal build instruction: “read RALPH_SPEC, implement the specification, do not add dependencies not required by the spec.”

Three problems emerged in the first iteration:

  1. Cascading errors: The initial loop used set -e, so a single Claude failure (often a timeout or a malformed spec) killed the entire build.
  2. Unnecessary dependency injection: When spec content was embedded inline in the prompt (pasted directly into the instruction), Claude consistently injected related-but-unrequested libraries. When the spec was referenced by path instead, the injection disappeared.
  3. Agents going off-track: Some specs were ambiguous enough that Claude chose an interpretation inconsistent with the rest of the system. Needed a “plan reviewer” pass : a separate agent call that reviews Claude’s plan before execution and flags deviations.

The fix was three changes: remove set -e (failures are recoverable, log and continue), reference spec files by path (never embed content inline), and add a plan reviewer agent that reads the spec plus the proposed plan and calls a replan function if they diverge.

Results

Confirmed with modifications. The ralph loop successfully built all 12 kiro-cli-factory components across a single 4-hour run. Nine specs produced working code on the first pass. Three required the plan reviewer to catch deviations (two were ambiguous spec files, one was a spec that referenced a shared type not yet generated).

The anti-dependency-bias discovery was the most surprising result: reference-vs-embed is a single-line change in the prompt template, but it eliminated an entire class of failure (unnecessary imports, inflated package.json dependencies) across all 12 components.

Findings

  1. Single-spec scoping is non-negotiable. When the entire spec corpus is in context, agents write code that anticipates future specs instead of just implementing the current one. Scoping to one spec at a time keeps each agent’s output bounded.

  2. Error tolerance (no set -e) is load-bearing. A multi-component build has recoverable and unrecoverable failures. Treating all failures as fatal conflates them. Log-and-continue lets the loop finish the 9 easy specs and flag the 3 hard ones for human review.

  3. Prompt anti-dependency bias is a real phenomenon. Embedding spec content inline triggers a different code-generation mode than referencing it by path. The path-reference mode is more focused; the inline mode produces “helpful” additions that bloat the output.

  4. The pattern generalizes. The same ralph loop structure was applied to dakka (initial scaffold) and peon-notify (hook system) with the same 3 constraints applied from the start, and both builds completed without plan reviewer interventions.

Next Steps

The ralph loop pattern is now formalized as a reusable approach. Key parameters: RALPH_SPEC for per-spec scoping, path-reference prompts, and a plan reviewer gate for specs flagged as ambiguous. Apply to any project where the build can be decomposed into independent file-level specifications. See projects/dakka/_index and projects/peon-notify/_index for the next applications.