audit-topics
Audit the Topics dimension for domain classification gaps, missing cross-links, and sparse content
Changelog
260420: multiple edits
- v_migrate: Changelog migrated from table to YYMMDD H3 format per versioning-standard rule 2 (V1.6 of skills upgrade plan)
- v6: Added license, sources per V6.1/V6.2 of skills upgrade plan
260406: v1.2
- v1.2: Added check 7 (PII and secrets scan per _config/pii-rules.yaml)
- v1.4: Added checks 13-15 (voice & tone, tag completeness, graph connectivity)
- v1.3: Added check 12 (content length & structure compliance per content-length-spec.yaml)
260403: multiple edits
- Added Agent to mounted_allowed_tools + post-audit visual enrichment trigger
- Added Visual Enrichment section + self-improving-agent-patterns cross-reference
260402: v1.2: Breakthrough cross-ref updated: topics linked from breakthroughs should reference breakthroughs/ directory notes
260401: Added Changelog section presence check (temporal enrichment spec)
260331: Initial creation
Description
Audits the Topics/Domain dimension of the vault. Topics are the vault’s domain-knowledge layer: each one captures a reusable concept, technique, or pattern extracted from project work. This skill checks that every topic is properly classified, cross-linked, and substantive enough to be useful on its own.
The audit covers six checks per topic and two vault-wide coverage checks. It produces a structured report with specific remediation steps for every gap found.
Controlled Vocabulary: Valid Domains
| Domain | Scope |
|---|---|
ml | Machine learning, deep learning, NLP, embeddings, model training |
data-eng | ETL, pipelines, data quality, batch processing, streaming |
quant-finance | Pricing models, Monte Carlo, risk, financial engineering |
game-dev | Game mechanics, procedural generation, combat systems, rendering |
ops | DevOps, deployment, CI/CD, monitoring, infrastructure |
career | Job search, resume, interviews, professional development |
omscs | Georgia Tech OMSCS coursework and academic topics |
real-estate | Property analysis, market models, investment |
ai-agents | LLM agents, tool use, multi-agent coordination, prompt engineering |
frontend | Browser APIs, React, Canvas, CSS, UI/UX, Electron |
Interface
Trigger: Run when reviewing vault health, after bulk topic creation, or when onboarding a new domain.
Inputs:
vault_path: root of the Obsidian vault (default~/vault/)controlled_domains: the 10 valid domain strings listed abovetopic_glob: file pattern to match (defaulttopics/*.md, explicitly excludestopics/pitfalls/**andtopics/bugs/**subdirectories)
Outputs:
dimension_report: per-topic markdown table with pass/fail on each of the 6 checksgap_summary: aggregate counts: how many topics fail each checkdomain_coverage_matrix: 10 domains x presence (which domains have at least 1 topic)remediation_commands: one concrete edit instruction per gap (field to add, value to set, section to write)
Provenance
First created during the vault-bootstrap audit (2026-03-31) after noticing that bulk-generated topics often lacked domain, had empty related_projects, or had placeholder bodies. The controlled vocabulary was established by surveying all 36 initial topics and the 10 project domains they cover.
Usage Notes
- Run this audit after running
vault-bootstrap lint: lint catches YAML syntax errors; this skill catches semantic gaps - The audit reads only
topics/*.md(flat), never descending intotopics/pitfalls/ortopics/bugs/ - Topics with
type: topicare the only files in scope; skip any file with a different or missing type - The
domainfield is recommended but not lint-required: this audit flags missing domains as warnings, not errors - The
createdfield IS required fortype: topic: flag missingcreatedas an error
What to Check
1. Topic Changelog Section (WARN)
Every topic file in topics/*.md (excluding bugs/ and pitfalls/ subdirectories) should have a ## Changelog section after frontmatter to track temporal changes to living reference documents.
How to check:
for f in "$VAULT"/topics/*.md; do
if ! grep -q "^## Changelog" "$f"; then
echo "WARN: $(basename "$f") missing ## Changelog section"
fi
done
Severity: WARN: Changelog sections track the evolution of reference knowledge. Missing changelogs reduce temporal traceability.
Per-Topic Checks (6 checks per file)
2. created field exists (REQUIRED)
Every type: topic note must have a created date in frontmatter. This is the only hard-required field beyond type.
How to detect: Parse YAML frontmatter, check for key created with a non-null value.
Severity: ERROR
3. domain is set to a valid controlled vocabulary value (RECOMMENDED)
The domain field should be one of the 10 valid strings. Missing domain is a warning; an invalid domain value is an error.
How to detect: Check domain key exists and its value is in [ml, data-eng, quant-finance, game-dev, ops, career, omscs, real-estate, ai-agents, frontend].
Severity: WARN if missing, ERROR if present but invalid
4. related_projects has at least 1 valid wikilink (RECOMMENDED)
The related_projects array should contain at least one entry matching the pattern [projects/{name}/_index](/projects/{name}/_index).
How to detect: Parse related_projects array. Each entry should be a string containing [projects/ and /_index](/projects/ and /_index). Validate that the referenced file actually exists on disk.
Severity: WARN if empty or missing, ERROR if references a non-existent project
5. Body has meaningful content (at least 2 sentences)
The body (everything after the closing --- of frontmatter) should contain real content, not just a heading.
How to detect: Strip YAML frontmatter and markdown headings. Count sentences (split on . or .\n). Require >= 2.
Severity: WARN
6. At least 1 inbound wikilink from another note (not orphaned)
Every topic should be referenced from at least one project, experiment, or other topic via a [topics/{name}](/topics/{name}) wikilink.
How to detect: Search the entire vault (excluding the topic file itself) for [topics/{topic-slug}](/topics/{topic-slug}) or [. At least 1 match required.
Severity: WARN
7. ## Related Topics section links to sibling topics in the same domain
Topics should cross-link to other topics that share the same domain, creating a navigable knowledge cluster.
How to detect: Look for a ## Related Topics heading in the body. If present, verify it contains at least one [[topics/ wikilink. If domain is set, verify the linked topics share the same domain.
Severity: INFO (aspirational, not blocking)
Vault-Wide Checks (2 checks)
8. All 10 domains have at least 1 topic
Every domain in the controlled vocabulary should be represented by at least one topic.
How to detect: Group all topics by domain. Report any domain with 0 topics.
Severity: WARN
9. No topic has a domain value outside the controlled vocabulary
Catch typos, legacy values, or domains that should be added to the vocabulary.
How to detect: Collect all unique domain values across topics. Flag any not in the controlled vocabulary.
Severity: ERROR
10. Semantic tags present where applicable (RECOMMENDED)
Topics should carry semantic tags in their frontmatter tags array to enable cross-cutting graph queries. Three semantic tags are defined in the controlled vocabulary:
| Tag | When to Apply |
|---|---|
architecture | Topic describes a structural pattern, system design, or architectural decision |
breakthrough | Topic documents or is linked from a significant improvement (>=10% metric gain) |
pitfall | Topic documents or is closely related to a failure pattern |
How to detect:
- For each topic, check if its body or title contains architectural keywords (pattern, architecture, system, design, pipeline, framework)
- If so, verify
architectureis in thetagsarray - For topics linked from experiment notes tagged
#breakthrough(or referenced by notes inbreakthroughs/), verifybreakthroughis in their tags - For topics linked from pitfall notes, verify
pitfallis in their tags
Severity: WARN
11. Topics linked from lifecycle chain have corresponding tags (INFO)
Cross-reference topics against experiments, pitfalls, and skills:
- Topics referenced by experiments with
breakthroughtag (or by notes inbreakthroughs/) should themselves carrybreakthrough - Topics referenced by pitfall notes should carry
pitfall - This enriches the vault graph and enables tag-based filtering in Obsidian
12. Content Length & Structure Compliance (WARN)
Every topics/*.md note (excluding topics/pitfalls/, topics/bugs/, and topics/false-positives/ subdirectories) must have body content within 300-600 words (1-2 pages) and include all required sections. Word count excludes YAML frontmatter, ## Changelog section, and image embeds (). Pitfall topics, bug topics, and false-positive topics are handled by their own audit skills and are excluded here. See _config/content-length-spec.yaml for the canonical spec.
Required sections: Overview, Related, Changelog
How to check:
for f in "$VAULT"/topics/*.md; do
# Skip subdirectories (pitfalls, bugs, false-positives are handled by other audits)
case "$f" in
*/topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
esac
slug=$(basename "$f" .md)
# Extract body: skip frontmatter, strip Changelog section and image embeds
body=$(awk '/^---$/{n++; next} n>=2' "$f" | sed '/^## Changelog/,/^## [^C]/{ /^## [^C]/!d; }' | grep -v '^!\[\[')
word_count=$(echo "$body" | wc -w | tr -d ' ')
if [ "$word_count" -lt 300 ]; then
echo "ERROR (stub): $slug is $word_count words (min 300)"
elif [ "$word_count" -gt 720 ]; then
echo "WARN (verbose): $slug is $word_count words (max 600)"
fi
for section in "Overview" "Related" "Changelog"; do
if ! grep -q "^## $section" "$f"; then
echo "MISSING SECTION: $slug lacks ## $section"
fi
done
done
Severity: ERROR if below 300 words (stub content); WARN if above 720 words (600 x 1.2, unfocused); ERROR if required section missing
13. Voice & Tone Compliance (WARN)
Topics follow a 60% clarity / 40% energy voice profile. The Overview section should open with a sentence matching the pattern “This concept matters because [reason]. Here’s how it works.”: the first clause establishes stakes, the second invites the reader in. Body sections clarify with concrete examples. Closing text (Related Topics or final paragraph) points forward to where this concept is applied in the vault.
Six-point checklist:
- Opening hook: Overview first sentence answers “why should I care about this concept?” (not “this topic covers…”)
- Jargon control: every domain-specific term is defined inline on first use or wikilinked to a definition note
- Concrete before abstract: at least one body section gives a specific example or vault application before naming the general pattern
- Impact quantified: at least one mention of what this concept enables or what breaks when it is misunderstood
- Active voice: descriptions use strong verbs: “prevents”, “accelerates”, “enables” rather than “is related to” / “can be seen in”
- Forward closing: last sentence of Overview or a Related section entry says what to explore next or which project applies this concept
How to check (passive voice and unexpanded acronym heuristic):
for f in "$VAULT"/topics/*.md; do
case "$f" in
*/topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
esac
slug=$(basename "$f" .md)
# Passive voice indicators
passive=$(grep -ciE "\b(was|were|is|are|been)\s+(found|observed|noted|shown|considered|used|done|made|given)\b" "$f" || echo 0)
if [ "$passive" -gt 3 ]; then
echo "WARN [voice]: $slug: $passive passive constructions (target ≤3)"
fi
# Unexpanded acronyms (ALL-CAPS 2-5 chars with no surrounding parenthetical)
body=$(awk '/^---$/{n++; next} n>=2' "$f")
unexpanded=$(echo "$body" | grep -oE '\b[A-Z]{2,5}\b' | sort -u | while read acronym; do
echo "$body" | grep -qE "$acronym\s*\(|\($acronym\)" || echo "$acronym"
done | tr '\n' ' ')
if [ -n "$unexpanded" ]; then
echo "WARN [voice]: $slug: unexpanded acronyms: $unexpanded"
fi
done
Severity: WARN
14. Tag Completeness (WARN)
Every topic note’s tags array must be well-formed and carry the minimum vocabulary signals. A topic with no tags: or only a single vague tag: is invisible to cross-cutting Dataview queries that power the vault’s audit dashboards and article cluster generators.
Requirements:
tagsmust be a YAML array (not an inline string)- Minimum 2 tags
- Must include at least one domain tag from the controlled vocabulary:
ml,data-eng,quant-finance,game-dev,ops,career,omscs,real-estate,ai-agents,frontend - If the topic title or body contains architectural keywords (pattern, architecture, system, design, pipeline, framework), the tag
architectureshould be present - Topics that appear to document a failure pattern but are not in the
pitfalls/subdir (e.g., they describe a known anti-pattern) should carrypitfall
How to check:
VALID_DOMAINS="ml data-eng quant-finance game-dev ops career omscs real-estate ai-agents frontend"
ARCH_KEYWORDS="pattern architecture system design pipeline framework"
for f in "$VAULT"/topics/*.md; do
case "$f" in
*/topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
esac
slug=$(basename "$f" .md)
# Extract tags block (frontmatter only)
tags=$(awk '/^---$/{n++; next} n==1 && /^tags:/{found=1; next} found && /^ - /{print $2} found && !/^ - /{found=0}' "$f" | tr -d "'\"")
tag_count=$(echo "$tags" | grep -c '\S' || echo 0)
if [ "$tag_count" -lt 2 ]; then
echo "WARN [tags]: $slug: fewer than 2 tags ($tag_count found)"
fi
has_domain=false
for d in $VALID_DOMAINS; do
echo "$tags" | grep -q "^${d}$" && has_domain=true && break
done
$has_domain || echo "WARN [tags]: $slug: no domain tag (one of: $VALID_DOMAINS)"
# Suggest architecture tag if architectural keywords present in title or body
combined=$(head -5 "$f" | tr '[:upper:]' '[:lower:]'; grep -i 'pattern\|architecture\|system\|pipeline\|framework' "$f" | head -3)
if [ -n "$combined" ] && ! echo "$tags" | grep -q '^architecture$'; then
echo "INFO [tags]: $slug: may warrant #architecture tag (architectural keywords detected)"
fi
done
Severity: WARN
15. Graph Connectivity (WARN)
Topics are the vault’s connective tissue: they are most valuable when other notes link to them and they link back. An isolated topic note accumulates knowledge but contributes nothing to the vault graph and will never surface through Obsidian navigation or Dataview relationship queries.
Requirements:
- Minimum 3 outbound wikilinks in the note body (topics are hubs: they should reference projects, experiments, and sibling topics)
## Relatedsection must exist and contain at least 1 wikilink: enforces the required section from the topic specrelated_projectsfrontmatter field must be populated: at least 1 entry linking to a project_index(check 4 validates the links resolve; this check ensures the field is non-empty)- Cross-dimensional link: at least one body wikilink that crosses into a different vault dimension:
experiments/,skills/,research/,projects/, orbreakthroughs/
How to check:
for f in "$VAULT"/topics/*.md; do
case "$f" in
*/topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
esac
slug=$(basename "$f" .md)
body=$(awk '/^---$/{n++; next} n>=2' "$f")
# Total outbound wikilinks in body
total_links=$(echo "$body" | grep -oE '\[\[^\](/^\)+\]\]' | wc -l | tr -d ' ')
if [ "$total_links" -lt 3 ]; then
echo "WARN [graph]: $slug: only $total_links outbound wikilinks (min 3 for a topic hub)"
fi
# ## Related section with at least 1 wikilink
related_section=$(echo "$body" | awk '/^## Related/{found=1; next} found && /^## /{found=0} found{print}')
related_links=$(echo "$related_section" | grep -cE '\[\[' || echo 0)
if [ "$related_links" -lt 1 ]; then
echo "WARN [graph]: $slug: ## Related section missing or has no wikilinks"
fi
# related_projects frontmatter populated
rp_count=$(awk '/^---$/{n++; next} n==1 && /^related_projects:/{found=1; next} found && /^ - /{count++} found && !/^ - /{found=0} END{print count+0}' "$f")
if [ "$rp_count" -lt 1 ]; then
echo "WARN [graph]: $slug: related_projects is empty (topic should link to ≥1 project)"
fi
# Cross-dimensional link
cross_dim=$(echo "$body" | grep -cE '\[\[(experiments|skills|research|projects|breakthroughs)/' || echo 0)
if [ "$cross_dim" -lt 1 ]; then
echo "INFO [graph]: $slug: no cross-dimensional link in body"
fi
done
Severity: WARN for fewer than 3 outbound links, missing Related wikilinks, or empty related_projects; INFO for missing cross-dimensional link
7. PII and Secrets Scan (ERROR/WARN)
Every topic note must be free of real API keys, database credentials, and personal contact information. Env var names in documentation are fine; env var values are not. See _config/pii-rules.yaml for the canonical blocked and redact pattern lists.
Blocked patterns (ERROR): API keys (sk-, AIza, ghp_, xoxb-, AKIA, whsec_, sk_live_, sk_test_, re_), database connection strings with embedded passwords, env var assignments with real values.
Redact patterns (WARN): Personal email addresses, GCP project numbers, phone numbers, SSNs, credit card numbers, street addresses. Replace with placeholders (e.g., <personal-email>, <gcp-project-id>).
VAULT=~/vault
for f in "$VAULT"/topics/*.md; do
[ -f "$f" ] || continue
slug=$(basename "$f")
if grep -qE 'sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35}|ghp_[a-zA-Z0-9]{36}|xoxb-|AKIA[A-Z0-9]{16}|whsec_|sk_live_|sk_test_|re_[a-zA-Z0-9]{20,}' "$f"; then
echo "ERROR (PII): $slug contains a blocked secret pattern"
fi
if grep -qE 'postgres(ql)?://[^:]+:[^@]+@|mongodb(\+srv)?://[^:]+:[^@]+@' "$f"; then
echo "ERROR (PII): $slug contains a database connection string with credentials"
fi
if grep -qE 'alexdgutierreza@gmail\.com|616560719313' "$f"; then
echo "WARN (PII): $slug contains personal email or GCP project number: replace with placeholder"
fi
if grep -qE '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b' "$f"; then
echo "ERROR (PII): $slug may contain an SSN"
fi
done
Severity: ERROR for blocked secrets (must fix before commit). WARN for redact patterns (fix before any content flows to public destinations).
Reference: _config/pii-rules.yaml: canonical pattern list and zone-based escalation rules.
What Complete Looks Like
A fully healthy Topics dimension meets all of the following:
- 36/36 topics have a valid
createddate - 36/36 topics have a
domainfrom the controlled vocabulary - 36/36 topics have
related_projectswith >= 1 valid[projects/.../_index](/projects/.../_index)wikilink - 36/36 topics have >= 2 sentences of meaningful body content
- 0 orphan topics: every topic has >= 1 inbound
[topics/{name}](/topics/{name})link from a project, experiment, or sibling topic - 10/10 domains represented: every domain in the controlled vocabulary has at least 1 topic
- 0 invalid domain values: no typos or out-of-vocabulary domains
- Semantic tags applied: topics describing patterns have
architecturein tags; topics linked frombreakthroughs/notes or#breakthrough-tagged experiments havebreakthrough vault-bootstrap lintpasses with 0 errors on all topic files
Example Passing Report
Topics Dimension Audit: 2026-03-31
====================================
Per-Topic Results (36 topics):
| Topic | created | domain | rel_proj | body | inbound | related |
|--------------------------------|---------|-----------|----------|------|---------|---------|
| accumulate-then-flush | PASS | data-eng | PASS (1) | PASS | PASS | PASS |
| behavioral-detection-counter… | PASS | career | PASS (1) | PASS | PASS | PASS |
| ... | ... | ... | ... | ... | ... | ... |
Summary:
created: 36/36 PASS
domain: 36/36 PASS (10/10 domains covered)
related_projects: 36/36 PASS
body content: 36/36 PASS
inbound links: 36/36 PASS
related topics: 36/36 PASS
Domain Coverage:
ml: 3 topics | data-eng: 5 | quant-finance: 2 | game-dev: 4 | ops: 3
career: 4 | omscs: 1 | real-estate: 1 | ai-agents: 5 | frontend: 4
How to Fill Gaps
Stamp last_audited
Every note you audit or create must have last_audited: YYYY-MM-DD in its frontmatter (today’s date). This enables vault-bootstrap stale to detect notes whose source files changed after the last audit. If the field is missing, add it. If it exists, update it to today.
Missing created
Set to the date the topic was first created. If unknown, use the earliest commit date of the file or today’s date.
created: 2026-03-31
Missing or invalid domain
Infer the domain from the topic’s name, body content, and which projects reference it:
| If the topic involves… | Assign domain |
|---|---|
| Pipelines, ETL, data quality, batch | data-eng |
| Browser, React, Canvas, CSS, Electron | frontend |
| Monte Carlo, pricing, risk, portfolio | quant-finance |
| Game mechanics, procedural gen, combat | game-dev |
| Deploy, CI/CD, Railway, Docker, cron | ops |
| Job search, interviews, resume, hiring | career |
| LLM, agent, tool use, prompt, MCP | ai-agents |
| ML models, embeddings, training, NLP | ml |
| OMSCS, coursework, academic | omscs |
| Property, market analysis, investment | real-estate |
If a topic spans two domains, pick the primary one and add the secondary as a tag.
Empty related_projects
Match the topic to projects by checking:
- Does any project’s
stackordescriptionmention this topic’s concept? - Does any project body contain
[topics/{this-topic}](/topics/{this-topic})? - Does the topic’s
project:legacy field name a project?
Then add:
related_projects:
- "[projects/{matched-project}/_index](/projects/{matched-project}/_index)"
Sparse body (< 2 sentences)
Expand the body using this template:
## Overview
{1-2 sentences: what this technique/concept is and why it matters.}
## Applications
- **{Project}**: {How this topic is applied in the project. 1-2 sentences.}
## Key Concepts
- {Concept 1}
- {Concept 2}
- {Concept 3}
Draw content from the projects listed in related_projects: the project _index.md and experiment notes are the primary source.
Orphan topics (0 inbound links)
Find which project should reference this topic:
- Check
related_projects: those projects should link back - Open each project’s
_index.mdand add a[topics/{slug}](/topics/{slug})wikilink in the body where relevant
If no project references the topic, consider whether the topic is truly standalone or should be merged into a parent topic.
Missing ## Related Topics section
- Find all topics sharing the same
domain - Pick 2-4 that are most conceptually related
- Add a section at the end of the file:
## Related Topics
- [topics/{sibling-1}](/topics/{sibling-1}): {one-line relationship}
- [topics/{sibling-2}](/topics/{sibling-2}): {one-line relationship}
Quality Checks
1. Lint passes
Run vault-bootstrap lint after any remediation. Zero errors required.
2. Domain coverage is complete
After remediation, verify all 10 domains have >= 1 topic. If a domain has 0 topics, either:
- Create a new topic for that domain from existing project knowledge
- Reassign a topic whose domain was incorrectly inferred
3. No empty related_projects
Every topic should link to at least one project. A topic with no project connection is either:
- Orphaned knowledge that should be linked to a project
- A stub that should be expanded or deleted
4. Cross-link reciprocity
For every related_projects entry [projects/X/_index](/projects/X/_index), verify that project X’s body contains [topics/{this-topic}](/topics/{this-topic}). Non-reciprocal links indicate the project notes are stale.
5. Domain consistency
Spot-check 5 random topics: does the domain value make sense given the body content? Flag any misclassifications.
6. No duplicate coverage
Two topics should not cover the same concept. If they do, merge the smaller into the larger and add an alias. Check by looking for topics with overlapping aliases or near-identical ## Overview sections.
Visual Enrichment
When this audit produces output that benefits from visualization:
| Finding Type | Tool | Specification |
|---|---|---|
| Domain coverage | R viz (skills/r-visualization-pipeline) | Family: COM, Template: Journal |
| Topic link density | R viz (skills/r-visualization-pipeline) | Family: NET, Template: Journal |
See topics/visual-output-routing for the full routing decision framework.
Self-improvement context: This audit skill implements the lint cycle of Pattern 4 (Compiler Wiki) from skills/self-improving-agent-patterns. The vault’s audit skills collectively form the lint+heal loop described in research/2026-04-02-karpathy-llm-knowledge-base-pattern.
Post-Audit Visual Enrichment Trigger
After completing all quality checks, dispatch a Sonnet subagent to generate visuals for this dimension:
[!tip] Auto-generate visuals after audit Use the Agent tool to dispatch a subagent that runs skills/wiki-visual-enrichment for the
topicsdimension. This generates R charts and Figma diagrams based on the Visual Enrichment specifications above. Content-hash dedup ensures only changed articles get new visuals.