Definitions
The Glossary
Every technical term used in this lab, explained for humans. Hover over a dotted-underline term anywhere on the site for a quick tooltip, or click through for the full explanation with a visualization.
A/B Test
Split your audience in two, show each group a different version, and let the numbers pick the winner.
Ablation Study
Remove one piece, measure the drop. Tells you which component is actually doing the work.
Baseline
The starting measurement everything gets compared to. Without one, improvement is just a claim.
Benchmark Comparison
Measure your system against a known standard. Not 'is it good?' but 'is it better than the alternative?'
Brier Score
A measure of how good your predictions are. 0 is perfect, 0.25 is coin-flip guessing.
Canary Deployment
Route a small percentage of traffic to the new version. If nothing breaks, gradually increase.
Canvas 2D Rendering
Browser-native pixel drawing API for games, visualizations, and interactive graphics.
CDP (Chrome DevTools Protocol)
The protocol for controlling Chrome programmatically. How the lab automates a real browser.
Chaos Engineering
Deliberately break things in production to find weaknesses before real failures do.
Control Group
The group that gets no change. The 'before' picture you compare everything against.
CVaR (Conditional Value at Risk)
The average loss in the worst-case scenarios. Measures how bad things get when they get bad.
DQI (Data Quality Index)
Composite score measuring how complete and correct your data is. Seven components, one number.
Drawdown
The fall from peak to trough. How much you lost before recovering, and how long it took.
Electron
Framework for building desktop apps with web technologies (Chromium + Node.js).
Embeddings
Text converted to numbers that capture meaning. Similar ideas land near each other in vector space.
ETL Pipeline
Extract-Transform-Load pattern for moving and reshaping data between systems.
Fine-Tuning
Taking a pre-trained AI model and teaching it something specific with targeted examples.
Futures Contract
An agreement to buy or sell something at a set price on a future date. The backbone of commodities.
Greedy Parameter Sweep
Try every combination of settings, pick the best. Brute force but reliable when done right.
Hallucination
When AI confidently says something that isn't true. Plausible fiction presented as fact.
Hypothesis Test
A formal bet: state what you expect, run the experiment, and let the data confirm or reject it.
Karpathy Ratchet
A quality metric that only goes up, never down. Each iteration must beat the previous best.
Knowledge Graph
Connecting facts as a web of relationships, not rows in a table. Entities linked by meaning.
Lifecycle Chain
The connective tissue of this lab: failure to pitfall to experiment to breakthrough. Nothing is wasted.
LLM Agent Architecture
Design patterns for autonomous AI agents that use tools, memory, and planning.
MAE (Mean Absolute Error)
The average distance between your predictions and reality. Lower is better.
MAPE (Mean Absolute Percentage Error)
MAE as a percentage. Makes errors comparable even when the numbers are wildly different scales.
MCP (Model Context Protocol)
Anthropic's standard for giving AI models tools and data. Like USB, but for AI connections.
Monte Carlo Simulation
Run thousands of random scenarios to map the range of possible outcomes. Dice rolls for decisions.
Multi-Persona Audit
Three or more expert reviewers with different lenses audit the same system. Finds every blind spot.
OAuth
Authorization protocol that lets apps access user data without sharing passwords.
Overfitting
When a model memorizes training data instead of learning the pattern. Perfect in class, fails the exam.
Parameter Tuning
Adjusting the knobs on your model to find the best settings. Tuning explores; ratchets only go up.
Polymarket
A prediction market where people bet real money on future events. The lab's oil model beats it.
Progressive Deployment
Deploy, measure breakage, fix, deploy again. Each iteration hardens the system through real-world contact.
Proof of Concept
Build the smallest possible version to prove the idea works before investing fully.
Quantitative Audit
Multi-round validation: check internal consistency, then forward accuracy, then external benchmarks.
R-squared (R²)
How much of reality your model explains. 1.0 means it captures everything, 0 means it captures nothing.
Ralph Loop
An autonomous build loop: iterate over spec files, invoke an AI agent per spec, assemble the whole system.
Rate Limiting
Controlling how fast you can make requests. A bouncer for APIs that prevents overload and abuse.
Red Team Testing
Hire someone to attack your system on purpose. Find vulnerabilities before a real attacker does.
Regression Testing
Run all existing tests after every change. Make sure fixing one thing didn't break ten others.
RMSE (Root Mean Square Error)
Average prediction error that penalizes big misses more than small ones. Lower is better.
Root Cause Analysis
Ask 'why?' five times. Don't fix the symptom; find and fix the actual cause underneath.
Sensitivity Analysis
Tweak each input and measure how much the output changes. Find which levers actually move the needle.
Sharpe Ratio
Return per unit of risk. Measures whether you're getting paid enough for the danger you're taking.
Statistical Significance
Is your result real or just luck? The math that tells you whether to trust your experiment.
Technical Spike
Time-boxed exploration to reduce risk before committing to a full build. Learning, not building.
Token
How AI reads text: not words, but chunks. A token is roughly 3/4 of a word on average.
Transformer
The neural network architecture behind GPT, Claude, and every modern LLM. Built on attention.
Volatility
How much a price bounces around. High volatility means big swings, not just big losses.
WebSocket
A persistent two-way connection between client and server. Like a phone call, not sending letters.