Adversarial-Competing Loop
Iterate by competing against self or other agents, using adversarial pressure to force capability improvement. IO-6 in the iteration-objective taxonomy.
Iterate by competing against self or other agents, using adversarial pressure to force capability improvement. IO-6 in the iteration-objective taxonomy.
An adversarial-competing loop iterates by pitting agents (or model versions) against each other. The competitive pressure forces both sides to develop stronger strategies. Unlike reflection (IO-3), which relies on self-judgment, adversarial loops use an OPPONENT as the feedback signal: the opponent’s attempts to exploit weaknesses provide external pressure that drives genuine improvement. This includes self-play (same agent against past versions), multi-agent debate (multiple agents argue toward consensus), and Constitutional AI (principles as implicit adversary).
How It Works
Agent A generates output → Agent B critiques/attacks → Agent A defends/revises → roles may swap → iterate until convergence or fixed rounds. The adversarial structure prevents the self-deception problem identified by Huang et al.: the opponent provides external pressure that self-reflection cannot.
Signal Type
Win/loss or consensus signal. The opponent IS the evaluation function. Arms race dynamics: each improvement by one side creates pressure for the other to improve. Risk: unbounded arms races can diverge rather than converge (need stopping criteria or formal convergence proofs like SPIN).
Academic Exemplars
- Self-Play (AlphaGo/AlphaZero): Agent plays against past versions. Each generation must beat the previous. Formal convergence through Nash equilibrium.
- Multi-Agent Debate (Du et al. 2023): Multiple LLM instances argue over multiple rounds. Improves factuality. Caveat: Huang et al. showed debate doesn’t outperform self-consistency when controlling for compute.
- Constitutional AI (Bai et al. Anthropic 2022): Principles act as implicit adversary. Model critiques itself against constitutional rules, generating RLAIF training data.
Vault Instances
- Agent-MQI: Tracks model quality across Claude versions. When Opus degrades within its own cohort, the system switches to the better-performing version: adversarial pressure between model generations.
- Stella quality hooks: 4-hook system where hooks act as adversarial gates on agent behavior (catastrophic, depth, io-ratio, drift).
Related
- Iteration Objective: taxonomy axis
- Reflection-Accumulating: IO-3 (self-judgment, not external opponent)
- Population-Evolving: IO-5 (selection, not competition)