Skill

audit-topics

auditdimension-health
Trigger

Audit the Topics dimension for domain classification gaps, missing cross-links, and sparse content

Version: 260420

Changelog

260420: multiple edits

  • v_migrate: Changelog migrated from table to YYMMDD H3 format per versioning-standard rule 2 (V1.6 of skills upgrade plan)
  • v6: Added license, sources per V6.1/V6.2 of skills upgrade plan

260406: v1.2

  • v1.2: Added check 7 (PII and secrets scan per _config/pii-rules.yaml)
  • v1.4: Added checks 13-15 (voice & tone, tag completeness, graph connectivity)
  • v1.3: Added check 12 (content length & structure compliance per content-length-spec.yaml)

260403: multiple edits

  • Added Agent to mounted_allowed_tools + post-audit visual enrichment trigger
  • Added Visual Enrichment section + self-improving-agent-patterns cross-reference

260402: v1.2: Breakthrough cross-ref updated: topics linked from breakthroughs should reference breakthroughs/ directory notes

260401: Added Changelog section presence check (temporal enrichment spec)

260331: Initial creation


Description

Audits the Topics/Domain dimension of the vault. Topics are the vault’s domain-knowledge layer: each one captures a reusable concept, technique, or pattern extracted from project work. This skill checks that every topic is properly classified, cross-linked, and substantive enough to be useful on its own.

The audit covers six checks per topic and two vault-wide coverage checks. It produces a structured report with specific remediation steps for every gap found.

Controlled Vocabulary: Valid Domains

DomainScope
mlMachine learning, deep learning, NLP, embeddings, model training
data-engETL, pipelines, data quality, batch processing, streaming
quant-financePricing models, Monte Carlo, risk, financial engineering
game-devGame mechanics, procedural generation, combat systems, rendering
opsDevOps, deployment, CI/CD, monitoring, infrastructure
careerJob search, resume, interviews, professional development
omscsGeorgia Tech OMSCS coursework and academic topics
real-estateProperty analysis, market models, investment
ai-agentsLLM agents, tool use, multi-agent coordination, prompt engineering
frontendBrowser APIs, React, Canvas, CSS, UI/UX, Electron

Interface

Trigger: Run when reviewing vault health, after bulk topic creation, or when onboarding a new domain.

Inputs:

  • vault_path: root of the Obsidian vault (default ~/vault/)
  • controlled_domains: the 10 valid domain strings listed above
  • topic_glob: file pattern to match (default topics/*.md, explicitly excludes topics/pitfalls/** and topics/bugs/** subdirectories)

Outputs:

  • dimension_report: per-topic markdown table with pass/fail on each of the 6 checks
  • gap_summary: aggregate counts: how many topics fail each check
  • domain_coverage_matrix: 10 domains x presence (which domains have at least 1 topic)
  • remediation_commands: one concrete edit instruction per gap (field to add, value to set, section to write)

Provenance

First created during the vault-bootstrap audit (2026-03-31) after noticing that bulk-generated topics often lacked domain, had empty related_projects, or had placeholder bodies. The controlled vocabulary was established by surveying all 36 initial topics and the 10 project domains they cover.

Usage Notes

  • Run this audit after running vault-bootstrap lint: lint catches YAML syntax errors; this skill catches semantic gaps
  • The audit reads only topics/*.md (flat), never descending into topics/pitfalls/ or topics/bugs/
  • Topics with type: topic are the only files in scope; skip any file with a different or missing type
  • The domain field is recommended but not lint-required: this audit flags missing domains as warnings, not errors
  • The created field IS required for type: topic: flag missing created as an error

What to Check

1. Topic Changelog Section (WARN)

Every topic file in topics/*.md (excluding bugs/ and pitfalls/ subdirectories) should have a ## Changelog section after frontmatter to track temporal changes to living reference documents.

How to check:

for f in "$VAULT"/topics/*.md; do
  if ! grep -q "^## Changelog" "$f"; then
    echo "WARN: $(basename "$f") missing ## Changelog section"
  fi
done

Severity: WARN: Changelog sections track the evolution of reference knowledge. Missing changelogs reduce temporal traceability.

Per-Topic Checks (6 checks per file)

2. created field exists (REQUIRED)

Every type: topic note must have a created date in frontmatter. This is the only hard-required field beyond type.

How to detect: Parse YAML frontmatter, check for key created with a non-null value.

Severity: ERROR

The domain field should be one of the 10 valid strings. Missing domain is a warning; an invalid domain value is an error.

How to detect: Check domain key exists and its value is in [ml, data-eng, quant-finance, game-dev, ops, career, omscs, real-estate, ai-agents, frontend].

Severity: WARN if missing, ERROR if present but invalid

The related_projects array should contain at least one entry matching the pattern [projects/{name}/_index](/projects/{name}/_index).

How to detect: Parse related_projects array. Each entry should be a string containing [projects/ and /_index](/projects/ and /_index). Validate that the referenced file actually exists on disk.

Severity: WARN if empty or missing, ERROR if references a non-existent project

5. Body has meaningful content (at least 2 sentences)

The body (everything after the closing --- of frontmatter) should contain real content, not just a heading.

How to detect: Strip YAML frontmatter and markdown headings. Count sentences (split on . or .\n). Require >= 2.

Severity: WARN

Every topic should be referenced from at least one project, experiment, or other topic via a [topics/{name}](/topics/{name}) wikilink.

How to detect: Search the entire vault (excluding the topic file itself) for [topics/{topic-slug}](/topics/{topic-slug}) or [. At least 1 match required.

Severity: WARN

Topics should cross-link to other topics that share the same domain, creating a navigable knowledge cluster.

How to detect: Look for a ## Related Topics heading in the body. If present, verify it contains at least one [[topics/ wikilink. If domain is set, verify the linked topics share the same domain.

Severity: INFO (aspirational, not blocking)

Vault-Wide Checks (2 checks)

8. All 10 domains have at least 1 topic

Every domain in the controlled vocabulary should be represented by at least one topic.

How to detect: Group all topics by domain. Report any domain with 0 topics.

Severity: WARN

9. No topic has a domain value outside the controlled vocabulary

Catch typos, legacy values, or domains that should be added to the vocabulary.

How to detect: Collect all unique domain values across topics. Flag any not in the controlled vocabulary.

Severity: ERROR

Topics should carry semantic tags in their frontmatter tags array to enable cross-cutting graph queries. Three semantic tags are defined in the controlled vocabulary:

TagWhen to Apply
architectureTopic describes a structural pattern, system design, or architectural decision
breakthroughTopic documents or is linked from a significant improvement (>=10% metric gain)
pitfallTopic documents or is closely related to a failure pattern

How to detect:

  1. For each topic, check if its body or title contains architectural keywords (pattern, architecture, system, design, pipeline, framework)
  2. If so, verify architecture is in the tags array
  3. For topics linked from experiment notes tagged #breakthrough (or referenced by notes in breakthroughs/), verify breakthrough is in their tags
  4. For topics linked from pitfall notes, verify pitfall is in their tags

Severity: WARN

11. Topics linked from lifecycle chain have corresponding tags (INFO)

Cross-reference topics against experiments, pitfalls, and skills:

  • Topics referenced by experiments with breakthrough tag (or by notes in breakthroughs/) should themselves carry breakthrough
  • Topics referenced by pitfall notes should carry pitfall
  • This enriches the vault graph and enables tag-based filtering in Obsidian

12. Content Length & Structure Compliance (WARN)

Every topics/*.md note (excluding topics/pitfalls/, topics/bugs/, and topics/false-positives/ subdirectories) must have body content within 300-600 words (1-2 pages) and include all required sections. Word count excludes YAML frontmatter, ## Changelog section, and image embeds (![[...](/topics/{topic-slug})). Pitfall topics, bug topics, and false-positive topics are handled by their own audit skills and are excluded here. See _config/content-length-spec.yaml for the canonical spec.

Required sections: Overview, Related, Changelog

How to check:

for f in "$VAULT"/topics/*.md; do
  # Skip subdirectories (pitfalls, bugs, false-positives are handled by other audits)
  case "$f" in
    */topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
  esac

  slug=$(basename "$f" .md)

  # Extract body: skip frontmatter, strip Changelog section and image embeds
  body=$(awk '/^---$/{n++; next} n>=2' "$f" | sed '/^## Changelog/,/^## [^C]/{ /^## [^C]/!d; }' | grep -v '^!\[\[')
  word_count=$(echo "$body" | wc -w | tr -d ' ')

  if [ "$word_count" -lt 300 ]; then
    echo "ERROR (stub): $slug is $word_count words (min 300)"
  elif [ "$word_count" -gt 720 ]; then
    echo "WARN (verbose): $slug is $word_count words (max 600)"
  fi

  for section in "Overview" "Related" "Changelog"; do
    if ! grep -q "^## $section" "$f"; then
      echo "MISSING SECTION: $slug lacks ## $section"
    fi
  done
done

Severity: ERROR if below 300 words (stub content); WARN if above 720 words (600 x 1.2, unfocused); ERROR if required section missing

13. Voice & Tone Compliance (WARN)

Topics follow a 60% clarity / 40% energy voice profile. The Overview section should open with a sentence matching the pattern “This concept matters because [reason]. Here’s how it works.”: the first clause establishes stakes, the second invites the reader in. Body sections clarify with concrete examples. Closing text (Related Topics or final paragraph) points forward to where this concept is applied in the vault.

Six-point checklist:

  1. Opening hook: Overview first sentence answers “why should I care about this concept?” (not “this topic covers…”)
  2. Jargon control: every domain-specific term is defined inline on first use or wikilinked to a definition note
  3. Concrete before abstract: at least one body section gives a specific example or vault application before naming the general pattern
  4. Impact quantified: at least one mention of what this concept enables or what breaks when it is misunderstood
  5. Active voice: descriptions use strong verbs: “prevents”, “accelerates”, “enables” rather than “is related to” / “can be seen in”
  6. Forward closing: last sentence of Overview or a Related section entry says what to explore next or which project applies this concept

How to check (passive voice and unexpanded acronym heuristic):

for f in "$VAULT"/topics/*.md; do
  case "$f" in
    */topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
  esac
  slug=$(basename "$f" .md)

  # Passive voice indicators
  passive=$(grep -ciE "\b(was|were|is|are|been)\s+(found|observed|noted|shown|considered|used|done|made|given)\b" "$f" || echo 0)
  if [ "$passive" -gt 3 ]; then
    echo "WARN [voice]: $slug: $passive passive constructions (target ≤3)"
  fi

  # Unexpanded acronyms (ALL-CAPS 2-5 chars with no surrounding parenthetical)
  body=$(awk '/^---$/{n++; next} n>=2' "$f")
  unexpanded=$(echo "$body" | grep -oE '\b[A-Z]{2,5}\b' | sort -u | while read acronym; do
    echo "$body" | grep -qE "$acronym\s*\(|\($acronym\)" || echo "$acronym"
  done | tr '\n' ' ')
  if [ -n "$unexpanded" ]; then
    echo "WARN [voice]: $slug: unexpanded acronyms: $unexpanded"
  fi
done

Severity: WARN

14. Tag Completeness (WARN)

Every topic note’s tags array must be well-formed and carry the minimum vocabulary signals. A topic with no tags: or only a single vague tag: is invisible to cross-cutting Dataview queries that power the vault’s audit dashboards and article cluster generators.

Requirements:

  • tags must be a YAML array (not an inline string)
  • Minimum 2 tags
  • Must include at least one domain tag from the controlled vocabulary: ml, data-eng, quant-finance, game-dev, ops, career, omscs, real-estate, ai-agents, frontend
  • If the topic title or body contains architectural keywords (pattern, architecture, system, design, pipeline, framework), the tag architecture should be present
  • Topics that appear to document a failure pattern but are not in the pitfalls/ subdir (e.g., they describe a known anti-pattern) should carry pitfall

How to check:

VALID_DOMAINS="ml data-eng quant-finance game-dev ops career omscs real-estate ai-agents frontend"
ARCH_KEYWORDS="pattern architecture system design pipeline framework"

for f in "$VAULT"/topics/*.md; do
  case "$f" in
    */topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
  esac
  slug=$(basename "$f" .md)

  # Extract tags block (frontmatter only)
  tags=$(awk '/^---$/{n++; next} n==1 && /^tags:/{found=1; next} found && /^  - /{print $2} found && !/^  - /{found=0}' "$f" | tr -d "'\"")

  tag_count=$(echo "$tags" | grep -c '\S' || echo 0)
  if [ "$tag_count" -lt 2 ]; then
    echo "WARN [tags]: $slug: fewer than 2 tags ($tag_count found)"
  fi

  has_domain=false
  for d in $VALID_DOMAINS; do
    echo "$tags" | grep -q "^${d}$" && has_domain=true && break
  done
  $has_domain || echo "WARN [tags]: $slug: no domain tag (one of: $VALID_DOMAINS)"

  # Suggest architecture tag if architectural keywords present in title or body
  combined=$(head -5 "$f" | tr '[:upper:]' '[:lower:]'; grep -i 'pattern\|architecture\|system\|pipeline\|framework' "$f" | head -3)
  if [ -n "$combined" ] && ! echo "$tags" | grep -q '^architecture$'; then
    echo "INFO [tags]: $slug: may warrant #architecture tag (architectural keywords detected)"
  fi
done

Severity: WARN

15. Graph Connectivity (WARN)

Topics are the vault’s connective tissue: they are most valuable when other notes link to them and they link back. An isolated topic note accumulates knowledge but contributes nothing to the vault graph and will never surface through Obsidian navigation or Dataview relationship queries.

Requirements:

  • Minimum 3 outbound wikilinks in the note body (topics are hubs: they should reference projects, experiments, and sibling topics)
  • ## Related section must exist and contain at least 1 wikilink: enforces the required section from the topic spec
  • related_projects frontmatter field must be populated: at least 1 entry linking to a project _index (check 4 validates the links resolve; this check ensures the field is non-empty)
  • Cross-dimensional link: at least one body wikilink that crosses into a different vault dimension: experiments/, skills/, research/, projects/, or breakthroughs/

How to check:

for f in "$VAULT"/topics/*.md; do
  case "$f" in
    */topics/pitfalls/*|*/topics/bugs/*|*/topics/false-positives/*) continue ;;
  esac
  slug=$(basename "$f" .md)
  body=$(awk '/^---$/{n++; next} n>=2' "$f")

  # Total outbound wikilinks in body
  total_links=$(echo "$body" | grep -oE '\[\[^\](/^\)+\]\]' | wc -l | tr -d ' ')
  if [ "$total_links" -lt 3 ]; then
    echo "WARN [graph]: $slug: only $total_links outbound wikilinks (min 3 for a topic hub)"
  fi

  # ## Related section with at least 1 wikilink
  related_section=$(echo "$body" | awk '/^## Related/{found=1; next} found && /^## /{found=0} found{print}')
  related_links=$(echo "$related_section" | grep -cE '\[\[' || echo 0)
  if [ "$related_links" -lt 1 ]; then
    echo "WARN [graph]: $slug: ## Related section missing or has no wikilinks"
  fi

  # related_projects frontmatter populated
  rp_count=$(awk '/^---$/{n++; next} n==1 && /^related_projects:/{found=1; next} found && /^  - /{count++} found && !/^  - /{found=0} END{print count+0}' "$f")
  if [ "$rp_count" -lt 1 ]; then
    echo "WARN [graph]: $slug: related_projects is empty (topic should link to ≥1 project)"
  fi

  # Cross-dimensional link
  cross_dim=$(echo "$body" | grep -cE '\[\[(experiments|skills|research|projects|breakthroughs)/' || echo 0)
  if [ "$cross_dim" -lt 1 ]; then
    echo "INFO [graph]: $slug: no cross-dimensional link in body"
  fi
done

Severity: WARN for fewer than 3 outbound links, missing Related wikilinks, or empty related_projects; INFO for missing cross-dimensional link

7. PII and Secrets Scan (ERROR/WARN)

Every topic note must be free of real API keys, database credentials, and personal contact information. Env var names in documentation are fine; env var values are not. See _config/pii-rules.yaml for the canonical blocked and redact pattern lists.

Blocked patterns (ERROR): API keys (sk-, AIza, ghp_, xoxb-, AKIA, whsec_, sk_live_, sk_test_, re_), database connection strings with embedded passwords, env var assignments with real values.

Redact patterns (WARN): Personal email addresses, GCP project numbers, phone numbers, SSNs, credit card numbers, street addresses. Replace with placeholders (e.g., <personal-email>, <gcp-project-id>).

VAULT=~/vault
for f in "$VAULT"/topics/*.md; do
  [ -f "$f" ] || continue
  slug=$(basename "$f")
  if grep -qE 'sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35}|ghp_[a-zA-Z0-9]{36}|xoxb-|AKIA[A-Z0-9]{16}|whsec_|sk_live_|sk_test_|re_[a-zA-Z0-9]{20,}' "$f"; then
    echo "ERROR (PII): $slug contains a blocked secret pattern"
  fi
  if grep -qE 'postgres(ql)?://[^:]+:[^@]+@|mongodb(\+srv)?://[^:]+:[^@]+@' "$f"; then
    echo "ERROR (PII): $slug contains a database connection string with credentials"
  fi
  if grep -qE 'alexdgutierreza@gmail\.com|616560719313' "$f"; then
    echo "WARN (PII): $slug contains personal email or GCP project number: replace with placeholder"
  fi
  if grep -qE '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b' "$f"; then
    echo "ERROR (PII): $slug may contain an SSN"
  fi
done

Severity: ERROR for blocked secrets (must fix before commit). WARN for redact patterns (fix before any content flows to public destinations).

Reference: _config/pii-rules.yaml: canonical pattern list and zone-based escalation rules.

What Complete Looks Like

A fully healthy Topics dimension meets all of the following:

  • 36/36 topics have a valid created date
  • 36/36 topics have a domain from the controlled vocabulary
  • 36/36 topics have related_projects with >= 1 valid [projects/.../_index](/projects/.../_index) wikilink
  • 36/36 topics have >= 2 sentences of meaningful body content
  • 0 orphan topics: every topic has >= 1 inbound [topics/{name}](/topics/{name}) link from a project, experiment, or sibling topic
  • 10/10 domains represented: every domain in the controlled vocabulary has at least 1 topic
  • 0 invalid domain values: no typos or out-of-vocabulary domains
  • Semantic tags applied: topics describing patterns have architecture in tags; topics linked from breakthroughs/ notes or #breakthrough-tagged experiments have breakthrough
  • vault-bootstrap lint passes with 0 errors on all topic files

Example Passing Report

Topics Dimension Audit: 2026-03-31
====================================

Per-Topic Results (36 topics):

| Topic                          | created | domain    | rel_proj | body | inbound | related |
|--------------------------------|---------|-----------|----------|------|---------|---------|
| accumulate-then-flush          | PASS    | data-eng  | PASS (1) | PASS | PASS    | PASS    |
| behavioral-detection-counter…  | PASS    | career    | PASS (1) | PASS | PASS    | PASS    |
| ...                            | ...     | ...       | ...      | ...  | ...     | ...     |

Summary:
  created:          36/36 PASS
  domain:           36/36 PASS (10/10 domains covered)
  related_projects: 36/36 PASS
  body content:     36/36 PASS
  inbound links:    36/36 PASS
  related topics:   36/36 PASS

Domain Coverage:
  ml: 3 topics | data-eng: 5 | quant-finance: 2 | game-dev: 4 | ops: 3
  career: 4    | omscs: 1    | real-estate: 1    | ai-agents: 5 | frontend: 4

How to Fill Gaps

Stamp last_audited

Every note you audit or create must have last_audited: YYYY-MM-DD in its frontmatter (today’s date). This enables vault-bootstrap stale to detect notes whose source files changed after the last audit. If the field is missing, add it. If it exists, update it to today.

Missing created

Set to the date the topic was first created. If unknown, use the earliest commit date of the file or today’s date.

created: 2026-03-31

Missing or invalid domain

Infer the domain from the topic’s name, body content, and which projects reference it:

If the topic involves…Assign domain
Pipelines, ETL, data quality, batchdata-eng
Browser, React, Canvas, CSS, Electronfrontend
Monte Carlo, pricing, risk, portfolioquant-finance
Game mechanics, procedural gen, combatgame-dev
Deploy, CI/CD, Railway, Docker, cronops
Job search, interviews, resume, hiringcareer
LLM, agent, tool use, prompt, MCPai-agents
ML models, embeddings, training, NLPml
OMSCS, coursework, academicomscs
Property, market analysis, investmentreal-estate

If a topic spans two domains, pick the primary one and add the secondary as a tag.

Match the topic to projects by checking:

  1. Does any project’s stack or description mention this topic’s concept?
  2. Does any project body contain [topics/{this-topic}](/topics/{this-topic})?
  3. Does the topic’s project: legacy field name a project?

Then add:

related_projects:
  - "[projects/{matched-project}/_index](/projects/{matched-project}/_index)"

Sparse body (< 2 sentences)

Expand the body using this template:

## Overview

{1-2 sentences: what this technique/concept is and why it matters.}

## Applications

- **{Project}**: {How this topic is applied in the project. 1-2 sentences.}

## Key Concepts

- {Concept 1}
- {Concept 2}
- {Concept 3}

Draw content from the projects listed in related_projects: the project _index.md and experiment notes are the primary source.

Find which project should reference this topic:

  1. Check related_projects: those projects should link back
  2. Open each project’s _index.md and add a [topics/{slug}](/topics/{slug}) wikilink in the body where relevant

If no project references the topic, consider whether the topic is truly standalone or should be merged into a parent topic.

  1. Find all topics sharing the same domain
  2. Pick 2-4 that are most conceptually related
  3. Add a section at the end of the file:
## Related Topics

- [topics/{sibling-1}](/topics/{sibling-1}): {one-line relationship}
- [topics/{sibling-2}](/topics/{sibling-2}): {one-line relationship}

Quality Checks

1. Lint passes

Run vault-bootstrap lint after any remediation. Zero errors required.

2. Domain coverage is complete

After remediation, verify all 10 domains have >= 1 topic. If a domain has 0 topics, either:

  • Create a new topic for that domain from existing project knowledge
  • Reassign a topic whose domain was incorrectly inferred

Every topic should link to at least one project. A topic with no project connection is either:

  • Orphaned knowledge that should be linked to a project
  • A stub that should be expanded or deleted

For every related_projects entry [projects/X/_index](/projects/X/_index), verify that project X’s body contains [topics/{this-topic}](/topics/{this-topic}). Non-reciprocal links indicate the project notes are stale.

5. Domain consistency

Spot-check 5 random topics: does the domain value make sense given the body content? Flag any misclassifications.

6. No duplicate coverage

Two topics should not cover the same concept. If they do, merge the smaller into the larger and add an alias. Check by looking for topics with overlapping aliases or near-identical ## Overview sections.

Visual Enrichment

When this audit produces output that benefits from visualization:

Finding TypeToolSpecification
Domain coverageR viz (skills/r-visualization-pipeline)Family: COM, Template: Journal
Topic link densityR viz (skills/r-visualization-pipeline)Family: NET, Template: Journal

See topics/visual-output-routing for the full routing decision framework.

Self-improvement context: This audit skill implements the lint cycle of Pattern 4 (Compiler Wiki) from skills/self-improving-agent-patterns. The vault’s audit skills collectively form the lint+heal loop described in research/2026-04-02-karpathy-llm-knowledge-base-pattern.

Post-Audit Visual Enrichment Trigger

After completing all quality checks, dispatch a Sonnet subagent to generate visuals for this dimension:

[!tip] Auto-generate visuals after audit Use the Agent tool to dispatch a subagent that runs skills/wiki-visual-enrichment for the topics dimension. This generates R charts and Figma diagrams based on the Visual Enrichment specifications above. Content-hash dedup ensures only changed articles get new visuals.