Skill

r-visualization-pipeline

data-engfrontendpatternbreakthrough
Trigger

User needs to create publication-quality R visualizations with ggplot2, or needs to validate visualization quality

Version: 260420

Changelog

260420: multiple edits

  • v_migrate: Changelog migrated from table to YYMMDD H3 format per versioning-standard rule 2 (V1.6 of skills upgrade plan)
  • v6: Added license, sources per V6.1/V6.2 of skills upgrade plan.
  • v1.5: Added ## Quality Checks section per V1.5 of ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md

260403: Added Visual Enrichment section + self-improving-agent-patterns cross-reference

260331: Initial creation


Description

A comprehensive R visualization system deployed as an MCP server, built around an 1800-line specification (content/plan.md) that codifies publication-quality ggplot2 practices into a reproducible, brand-aware pipeline. The system covers the full lifecycle: chart family selection, script generation from 36 battle-tested templates, brand token theming, and automated 108-check quality validation. Deployed on Railway as a remote MCP server. See internal brand system for the full project context.

Interface

Trigger: User needs a publication-quality R visualization, wants to select the right chart type for their data, or needs to validate an existing visualization against quality standards.

Inputs:

  • data_description: what data is being visualized (structure, dimensions, purpose)
  • chart_family: one of 9 families (see chart family table below)
  • brand_template: one of 4 brand templates: Slate, Aurora, Earth, Journal
  • quality_target: target rubric band: BAD (1-3), OKAY (4-5), GOOD (6-7), EXCELLENT (8-10)

Outputs:

  • r_script: complete, self-contained R script with all dependencies declared
  • validation_report: 108-check quality score with per-check pass/fail/warning
  • publication_ready_plot: PNG or PDF with brand theming applied

Visualization Specification (content/plan.md)

16-section spec covering: grammar of graphics philosophy, 40+ vetted R packages, plot selection guide, color system (colorblind-safe), theme/typography, annotation, scales, multi-panel composition, export settings, 10 worked examples, 26-point quality checklist, edge cases, data ingestion, publication tables, advanced charts.

Nine Chart Families (36 battle-tested scripts)

FamilyCodeScriptsTypes
Comparisoncmp_4Grouped bar, lollipop, violin, dumbbell
Compositioncom_4Stacked bar, treemap, waffle, alluvial
Correlationcor_4Scatter+trend, labeled scatter, matrix, bubble
Distributiondst_7Histogram, density, ridgeline, boxplot, raincloud, ECDF
Geospatialgeo_4Choropleth, bubble map, hex tile, faceted
Networknet_3Force-directed, tree, circular
Statisticalsta_4Forest plot, PCA biplot, regression diagnostics, QQ
Survivalsur_2Kaplan-Meier, KM+risk table
Time Seriests_4Multiline, stacked area, line+ribbon, dual facet

See topics/brand-token-chart-families for the full mapping of families to brand tokens.

Brand Token System (4 templates x 35+ tokens)

Each brand template defines typography, surfaces, palettes (qualitative, sequential, diverging), data-viz tokens, spacing, and semantic colors:

  • Slate: Helvetica / Paul Tol Bright. Clean corporate default.
  • Aurora: Avenir / Paul Tol Vibrant. High-energy, saturated.
  • Earth: Georgia / Paul Tol Muted. Academic, subdued.
  • Journal: Palatino / Tableau-10. Publication-ready, classic serif.

All palettes are colorblind-safe by construction (Paul Tol or Tableau lineage).

Quality Validation Pipeline (108 checks)

See topics/r-viz-quality-validation-pipeline for the full validation architecture.

  • 12 automated mechanical checks (regex on script content)
  • 12 family-specific checks x 9 families = 108 total
  • Rubric scoring: BAD/OKAY/GOOD/EXCELLENT with specific criteria per family
  • Current baseline: 36/36 scripts pass, 0 errors, 37 warnings

MCP Server (deployed on Railway)

Part of the topics/mcp-server-ecosystem. Exposes:

  • Resources: plan sections, skill metadata, prompt templates, example scripts, validation rules
  • Tools: get_plan_section, search_plan, list_chart_families, get_quality_checklist
  • Prompts: 9 family-specific generators (create_scatter, create_distribution, create_timeseries, etc.)

Provenance

Born from the brand token system experiment (Feb 2026, predates experiments dimension): the insight that visualization quality is reproducible when you externalize design decisions into tokens and validate mechanically. The 1800-line plan emerged from iterating on the 36 scripts until all 108 checks passed: the spec is a distillation of what worked, not a theoretical document.

Key milestones:

  • 36 scripts across 9 families: each script is self-contained, tested, and brand-aware
  • 108-check validation pipeline: automated quality gate that catches the most common visualization failures (missing labels, colorblind-unsafe palettes, poor aspect ratios, overplotting)
  • 4 brand templates: production-tested token sets that map to real publication contexts
  • MCP deployment: server on Railway exposes the full pipeline programmatically

Usage Notes

  • Start with list_chart_families to select the right family for your data shape
  • Always specify a brand template: the default (Slate) is safe but generic
  • The quality target should be EXCELLENT (8-10) for anything going into a report or publication
  • Distribution family (dst_) has the most scripts (7) because distributions are the most common visualization need and the most commonly botched
  • Survival family (sur_) has only 2 scripts but they are the most complex (KM+risk table is ~200 lines)
  • The validation pipeline catches issues that look fine on screen but fail in print (DPI, font embedding, color contrast)
  • Geospatial scripts require additional system dependencies (GDAL/PROJ): check geo_ prerequisites before running

Quality Checks

  1. Chart renders without ggplot errors. Rscript <chart>.R 2>&1 | grep -c 'Error\|Warning' returns 0.
  2. Export DPI correct. Print: 300 DPI. Web: 144 DPI. Verify with exiftool <chart>.png | grep -i dpi.
  3. Fonts embed in PDF export. pdffonts <chart>.pdf shows no Type 3 fonts (= rasterized); all fonts embedded.
  4. Family code matches spec. DST (distribution), COR (correlation), TS (time-series), CMP (comparison), COM (composition). Per [topics/visual-output-routing](/topics/visual-output-routing).
  5. No overplotting. For scatter plots with n>100, alpha-blending or jittering present. Verify visually or via density-overlay check.
  6. Axes labeled, title + caption present. Every chart has ggtitle, labs(x=, y=), and a source-note caption.

Visual Enrichment

MediumTypeDescription
RSTA forest plot108-check pass rates by family
FigmaFlowchartPipeline: data -> family -> script -> brand -> validation

Self-Improvement Cross-Reference

This skill is the R-side implementation of the routing described in topics/visual-output-routing. It is both a tool and a product of Pattern 4 (Compiler Wiki): the quality validator is a lint cycle. For the master reference on all 6 self-improvement patterns, see skills/self-improving-agent-patterns.