Experiment Preferences jobs-apply

A provider-transparent rate limiter keyed by API key + provider can prevent cost blowouts on free-tier API plans without requiring consumer changes

Zero consumer changes needed. Rate limiting invisible inside provider layer. Prevented cost blowouts during early development on Tier 1 plans. Pattern

February 26, 2026

airate-limitingarchitecturecareer

Hypothesis

A provider-transparent rate limiter keyed by API key + provider can prevent cost blowouts on free-tier API plans without requiring consumer changes

Result: confirmed

Key Findings

Zero consumer changes needed. Rate limiting invisible inside provider layer. Prevented cost blowouts during early development on Tier 1 plans. Pattern persisted into production.

Changelog

Date	Summary
2026-04-07	Created during temporal gap audit
2026-02-27	Original experiment

A transparent sliding-window rate limiter, keyed by API key plus provider name, can prevent API cost blowouts on free and Tier 1 plans without any changes to the consuming code. The bet was that rate limiting is a horizontal concern, not a caller concern. If every call site had to know its own quota and back off locally, the system would drift out of sync with provider reality the first time a plan changed. A single choke-point inside the provider layer should be able to absorb that volatility invisibly.

Method

I built rate-limiter.ts with conservative defaults calibrated to real free-tier and Tier 1 limits: Google at 10 requests per minute, Anthropic at 50, OpenAI at 500, and a fallback of 20 for any provider not explicitly configured. The limiter uses a sliding-window counter (the past 60 seconds of timestamps) rather than a fixed-window bucket, because fixed windows let bursts at minute boundaries smash through quotas. On a 429 response, the wrapper retries with exponential backoff (250ms, 500ms, 1s, 2s, 4s, capped at 30s) and surfaces the final error only if every retry fails.

The key design choice: rate limiting decorates the provider rather than wrapping each call site. Callers never see the limiter. They call provider.chat() the same way they always did. The limiter intercepts at the provider boundary, queues if the sliding window is full, and serves the call once the window has space. Consumer code stayed identical before and after the change.

Results

Confirmed. Zero consumer changes were required across the entire engine. The rate limiter is invisible inside the provider layer. During the first week the engine ran against all 10 platform adapters simultaneously, the limiter prevented multiple near-blowouts. Specifically, it caught two scenarios that would have cost real money: one where a parallel worker storm hit Google’s Gemini endpoint simultaneously, and one where an LLM-scored matching loop got into a tight retry cycle after a transient 5xx. In both cases the limiter absorbed the burst and serialized the calls behind the window.

Findings

Transparent rate limiting works because rate limits are a concern about aggregate behavior across time, not about any individual call. The moment you push the limit-check into the caller, you are asking each caller to reason about every other caller, which is exactly the kind of distributed-systems problem that a centralized choke-point avoids. The pattern persisted straight through to production: the same provider layer now ships inside the SaaS deployment, unchanged.

Next Steps

Extend the limiter to accept dynamic quotas from the provider’s own rate-limit headers rather than hardcoded defaults, so that a plan upgrade raises the ceiling automatically without a config edit.

Source

jobs-apply engine rate-limiter.ts, commit dated 2026-02-27.