Definitions
The Glossary
Every technical term used in this lab, explained for humans. Hover over a dotted-underline term anywhere on the site for a quick tooltip, or click through for the full explanation with a visualization.
A/B Test
Split your audience in two, show each group a different version, and let the numbers pick the winner.
Ablation Study
Remove one piece, measure the drop. Tells you which component is actually doing the work.
Adversarial-Competing Loop
Iterate by competing against self or other agents, using adversarial pressure to force capability improvement. IO-6 in the iteration-objective taxonomy.
Appcast
An RSS-based XML feed that desktop applications poll to discover and download new versions automatically.
Automation Ratio
The fraction of AI pipeline session volume attributable to automated batch processes versus interactive human-driven sessions. The key metric for distinguishing genuine productivity from inflated session counts.
Benchmark Comparison
Measure your system against a known standard. Not 'is it good?' but 'is it better than the alternative?'
Bezier Curve
Mathematical curve for smooth, human-like mouse trajectories. Straight-line movements are a bot fingerprint.
Brier Score
A measure of how good your predictions are. 0 is perfect, 0.25 is coin-flip guessing.
Canary Deployment
Route a small percentage of traffic to the new version. If nothing breaks, gradually increase.
Canvas 2D Rendering
Browser-native pixel drawing API for games, visualizations, and interactive graphics.
Cascade Attack
A multi-turn manipulation strategy that escalates context incrementally until a safety boundary is crossed.
CDP (Chrome DevTools Protocol)
The protocol for controlling Chrome programmatically. How the lab automates a real browser.
Chaos Engineering
Deliberately break things in production to find weaknesses before real failures do.
CVaR (Conditional Value at Risk)
The average loss in the worst-case scenarios. Measures how bad things get when they get bad.
Designed-Then-Built
A development mode where the design is fully resolved before the first line of code is committed, producing large initial commits and minimal post-initial structural change. Contrasts with scaffold-then-accrete.
DQI (Data Quality Index)
Composite score measuring how complete and correct your data is. Seven components, one number.
Drawdown
The fall from peak to trough. How much you lost before recovering, and how long it took.
Electron
Framework for building desktop apps with web technologies (Chromium + Node.js).
Embeddings
Text converted to numbers that capture meaning. Similar ideas land near each other in vector space.
Exponential Backoff
Retry strategy that doubles wait time after each failure. Prevents thundering herd while keeping recovery fast.
Fine-Tuning
Taking a pre-trained AI model and teaching it something specific with targeted examples.
Futures Contract
An agreement to buy or sell something at a set price on a future date. The backbone of commodities.
Gaussian Distribution
Bell curve probability distribution. Used for timing jitter that looks human-natural, not robotic-uniform.
Goal-Seeking Loop
Iterate until a discrete completion condition is met. Build new things from specs. IO-1 in the iteration-objective taxonomy.
Goodhart Gaming
Greedy Parameter Sweep
Try every combination of settings, pick the best. Brute force but reliable when done right.
Hallucination
When AI confidently says something that isn't true. Plausible fiction presented as fact.
Hybrid Search
Combining BM25 keyword search with vector similarity search, fused via Reciprocal Rank Fusion (RRF).
Iteration Objective
The second axis of self-improvement taxonomy: what the agent is iterating TOWARD (goal, metric, correctness, diversity, robustness, equilibrium).
Karpathy Ratchet
A quality metric that only goes up, never down. Each iteration must beat the previous best.
Lifecycle Chain
The connective tissue of this lab: failure to pitfall to experiment to breakthrough. Nothing is wasted.
Lookahead Bias
Using future data to predict the past. Always improves backtest results. Always fails in production.
MAE (Mean Absolute Error)
The average distance between your predictions and reality. Lower is better.
MAPE (Mean Absolute Percentage Error)
MAE as a percentage. Makes errors comparable even when the numbers are wildly different scales.
MCP (Model Context Protocol)
Anthropic's standard for giving AI models tools and data. Like USB, but for AI connections.
Model Escalation
The pattern where an AI pipeline shifts to a more capable (and costly) model as task difficulty increases, either through explicit routing logic or emergent queue composition.
Model-Capability Harness
Eval archetype that measures raw model abilities using deterministic scoring. Think SAT for language models.
Monorepo
A single version-controlled repository containing multiple related packages or services with shared tooling.
Monte Carlo Simulation
Run thousands of random scenarios to map the range of possible outcomes. Dice rolls for decisions.
OAuth
Authorization protocol that lets apps access user data without sharing passwords.
Ownership Model
Rust's compile-time memory management system: every value has one owner, borrows are checked at compile time, race conditions become impossible.
Pearson Correlation
Measures linear relationship between two variables, -1 to +1. Dangerous when used to validate derived scores against themselves.
Placeholder Sentinel
Polymarket
A prediction market where people bet real money on future events. The lab's oil model beats it.
Population-Evolving Loop
Iterate by maintaining a diverse population of agents or solutions, selecting the fittest, and mutating. IO-5 in the iteration-objective taxonomy.
Progressive Deployment
Deploy, measure breakage, fix, deploy again. Each iteration hardens the system through real-world contact.
Proof of Concept
Build the smallest possible version to prove the idea works before investing fully.
R-squared (R²)
How much of reality your model explains. 1.0 means it captures everything, 0 means it captures nothing.
Ralph Loop
An autonomous build loop: iterate over spec files, invoke an AI agent per spec, assemble the whole system.
Rate Limiting
Controlling how fast you can make requests. A bouncer for APIs that prevents overload and abuse.
Red Team Testing
Hire someone to attack your system on purpose. Find vulnerabilities before a real attacker does.
Reflection-Accumulating Loop
Iterate by building verbal self-critique memory. Each failure adds to an episodic buffer that steers future attempts. IO-3 in the iteration-objective taxonomy.
Regression Testing
Run all existing tests after every change. Make sure fixing one thing didn't break ten others.
RMSE (Root Mean Square Error)
Average prediction error that penalizes big misses more than small ones. Lower is better.
Seeded PRNG
A pseudo-random number generator initialized with a fixed seed, producing identical sequences on every run.
Session Architecture
How AI pipeline sessions are structured, batched, and sequenced, including model selection per session type, target ordering, and session-to-task granularity. The primary cost driver in high-volume AI pipelines.
Sharpe Ratio
Return per unit of risk. Measures whether you're getting paid enough for the danger you're taking.
Snapshot Testing
Test by comparing output against a committed golden file; any change forces an explicit decision.
Target Pool Composition
The distribution of task difficulty tiers within an AI pipeline's work queue. The primary determinant of effective cost per session; harder pools force model escalation regardless of configured routing defaults.
Tenant Isolation
Multi-tenant data architecture where one user's data cannot be read or written by another user, enforced at multiple layers.
Token
How AI reads text: not words, but chunks. A token is roughly 3/4 of a word on average.
Transformer
The neural network architecture behind GPT, Claude, and every modern LLM. Built on attention.
Volatility
How much a price bounces around. High volatility means big swings, not just big losses.
WebSocket
A persistent two-way connection between client and server. Like a phone call, not sending letters.