Pitfall stella

Llm Audit Speed

April 7, 2026

pitfallai-agentsKNOWLEDGEaudit

Changelog

Date	Summary
2026-04-19	Initial creation from vault-engine-architecture idea; mitigated via tiered heuristic/Haiku/Sonnet scoring

LLM audit speed

When a vault or codebase audit is scored entirely by LLM calls, each individual check costs 1-5 seconds plus a tokens bill, and a full dimension audit (30-150 items) stretches to 30-60 seconds of wall time. Worse, LLM scorers drift into false-positive territory on boundary cases, which forces re-audit runs that double or triple the real cost. The audit that was supposed to be a lint step becomes the slowest step in the knowledge pipeline.

Symptom

A single dimension audit (topics, pitfalls, research) takes 30-60 seconds
Re-running the audit after a small edit costs the same; there is no incremental path
10-25% of flagged issues turn out to be false positives on inspection
Audit cost is the dominant budget line for the knowledge engine; developers stop running it
Obvious structural failures (missing frontmatter field, broken wikilink) take the same time and cost as subtle semantic failures

Root Cause

Per-frame LLM evaluation is the wrong tool for structural and schema checks. The model is asked to grade things a regex would answer in microseconds: is the created field present, is the domain one of ten allowed strings, does the body have two sentences. The LLM’s probabilistic nature then generates a false-positive tail (misreads formatting, flags valid-but-unusual phrasing) that forces re-runs and manual triage.

The deeper cause: the audit was designed as a one-pass LLM judge rather than a tiered pipeline. Cheap deterministic checks (schema, links, word count, regex patterns) were never peeled off into a pre-filter layer. Every check, regardless of semantic complexity, pays the full LLM cost.

Detection

Measure audit wall time and cost per dimension. If any of these hold, the pitfall is active:

Single-dimension audit exceeds 10 seconds of wall time
Re-audit of unchanged content costs the same as first audit (no incremental path)
False-positive rate above 10% on spot-checked flagged issues
Token cost per audit run tracks linearly with item count (no tiered short-circuit)
Developers avoid running the audit because it is too slow to iterate on

Remediation

Split the audit into a tiered pipeline. Run the cheapest stage first; only escalate to the next tier when the current stage cannot resolve the check:

Deterministic scanners (microseconds): schema validation, frontmatter field presence, wikilink integrity, word count, regex patterns, controlled-vocabulary membership. Written in Rust or Python. This handles ~70% of checks with zero LLM cost.
Haiku tier (100-500ms): semantic sanity checks that need language understanding but not deep reasoning: is this sentence meaningful, does this tag match the body content, is this voice compliant. Cheap model, tight prompt, cached where possible.
Sonnet tier (1-5s): hard semantic calls: is this claim supported by the cited research, does this body actually cover the topic, is the cross-link appropriate. Only invoked for items that passed tiers 1-2 but still need judgment.

Pair the tiering with content-hash invalidation: keep an audit_state table keyed by content hash, so unchanged items short-circuit the entire pipeline. This turns audit cost from linear in item count to linear in changed-item count.

The vault-engine-architecture idea from 2026-04-08 drove this exact restructure: see ideas/2026-04-08-vault-engine-architecture and the resulting breakthrough note.

ideas/2026-04-08-vault-engine-architecture: the idea that originally surfaced this pitfall
topics/pitfalls/circular-knowledge-corruption: sibling pitfall in the KNOWLEDGE cluster
topics/pitfalls/agent-memory-bloat: another audit-cost pattern: unbounded state growth
skills/audit-topics: deterministic scanner implementation for the topics dimension
skills/audit-pitfalls: deterministic scanner implementation for the pitfalls dimension
skills/karpathy-ratchet: iterative-optimization pattern underlies the tiered design
projects/stella/_index: parent project for the vault knowledge engine work