Pirate Ship Provider Quarantine Circuit Breaker
What Happened

Running campaigns across 35 models and 4 providers simultaneously, rate limit storms (HTTP 429) cascaded. A single quarantined provider silently disabled a large fraction of model coverage. Campaigns completed with fewer models than expected, and the original implementation hid provider initialization errors behind defensive try/except blocks that returned None on failure : no exception raised, no log warning, just missing results.
The quarantine policy (3 consecutive 429s OR ≥30% 429s in last 20 calls) is aggressive enough that a brief rate-limit spike with 4 concurrent calls knocks out a provider for the rest of a campaign.
Root Cause
Two compounding decisions made failures invisible. First: provider adapters caught init errors and returned None instead of raising. Second: the circuit breaker quarantine state was not surfaced in campaign reports, so there was no way to know which models had been disabled and why. Failure was silent at two independent layers.
How to Avoid
- Fail-fast on init (commit
455224a): provider registry now raises immediately if any provider cannot be created. “Initialization aborts if any provider cannot be created.” - Remove defensive try/except (commit
5fda378): root-level campaign scripts no longer catch and swallow errors from provider calls - Quarantine stats in reports (commit
421f590): dashboard shows quarantine events from provider logs so the operator can see which models were disabled and why
For evaluation frameworks where completeness matters, circuit breakers protect against runaway costs but can silently reduce coverage. Always surface quarantine state prominently.
Related
- projects/pirate-ship/_index : parent project
- pirate-ship: project context
- redcorsair-campaign-rate-limiting: earlier rate-limiting approach in redcorsair