Definition

Reflection-Accumulating Loop

Iterate by building verbal self-critique memory. Each failure adds to an episodic buffer that steers future attempts. IO-3 in the iteration-objective taxonomy.

definitionai-agentsiteration-objectiveself-correction

Iterate by building verbal self-critique memory. Each failure adds to an episodic buffer that steers future attempts. IO-3 in the iteration-objective taxonomy.

A reflection-accumulating loop iterates by generating natural-language self-critique after each attempt and storing it in an episodic memory buffer. On the next attempt, all prior reflections are prepended to the prompt, creating “semantic gradients” that steer behavior without changing model weights. The vault instance is the Lifecycle Chain: failures become pitfalls, pitfalls become experiments, experiments become breakthroughs. Each stage accumulates structured reflection that informs the next.

How It Works

Attempt task → receive outcome signal → generate verbal reflection (“I failed because X”) → store in episodic buffer → prepend buffer to next attempt → re-attempt. The buffer grows monotonically within a session. Cross-session persistence requires explicit memory management.

Signal Type

Verbal critique (natural language). No scalar gradient: the quality of reflection determines convergence. CRITICAL LIMIT: Huang et al. (ICLR 2024) proved that without external feedback, self-reflection DEGRADES performance. The model changes correct answers to incorrect ones. Every working reflection loop relies on an external signal (environment reward, test suite, human judgment) to ground the reflection. See topics/pitfalls/self-correction-without-external-feedback.

Academic Exemplars

  • Reflexion (Shinn et al. NeurIPS 2023): +22% on AlfWorld, +20% on HotPotQA via verbal reinforcement learning. External reward signal triggers reflection; reflections persist as episodic memory.
  • Self-Refine (Madaan et al. NeurIPS 2023): Same model critiques and refines its own output. ~20% improvement across 7 tasks BUT degrades without ground truth (Huang constraint).
  • Chain of Verification / CoVe (Dhuliawala et al. ACL 2024): Decoupled verification: each claim checked independently to prevent error propagation. +23% F1 on Wikidata.

Vault Instances

  • Lifecycle Chain: failure → pitfall → research → experiment → breakthrough. Structured reflection across the vault’s knowledge lifecycle.
  • Dashboard QA weakness log: documents blind spots after each QA cycle, informing the next.