Skill

r-visualization-pipeline

March 30, 2026

data-engfrontendpatternbreakthrough

Trigger

User needs to create publication-quality R visualizations with ggplot2, or needs to validate visualization quality

Version: 260420

Changelog

260420: multiple edits

v_migrate: Changelog migrated from table to YYMMDD H3 format per versioning-standard rule 2 (V1.6 of skills upgrade plan)
v6: Added license, sources per V6.1/V6.2 of skills upgrade plan.
v1.5: Added ## Quality Checks section per V1.5 of ~/vault/plans/2026-04-20-vault-skills-upgrade-plan.md

260403: Added Visual Enrichment section + self-improving-agent-patterns cross-reference

A comprehensive R visualization system deployed as an MCP server, built around an 1800-line specification (content/plan.md) that codifies publication-quality ggplot2 practices into a reproducible, brand-aware pipeline. The system covers the full lifecycle: chart family selection, script generation from 36 battle-tested templates, brand token theming, and automated 108-check quality validation. Deployed on Railway as a remote MCP server. See internal brand system for the full project context.

Interface

Trigger: User needs a publication-quality R visualization, wants to select the right chart type for their data, or needs to validate an existing visualization against quality standards.

Inputs:

data_description: what data is being visualized (structure, dimensions, purpose)
chart_family: one of 9 families (see chart family table below)
brand_template: one of 4 brand templates: Slate, Aurora, Earth, Journal
quality_target: target rubric band: BAD (1-3), OKAY (4-5), GOOD (6-7), EXCELLENT (8-10)

Outputs:

r_script: complete, self-contained R script with all dependencies declared
validation_report: 108-check quality score with per-check pass/fail/warning
publication_ready_plot: PNG or PDF with brand theming applied

Visualization Specification (content/plan.md)

16-section spec covering: grammar of graphics philosophy, 40+ vetted R packages, plot selection guide, color system (colorblind-safe), theme/typography, annotation, scales, multi-panel composition, export settings, 10 worked examples, 26-point quality checklist, edge cases, data ingestion, publication tables, advanced charts.

Nine Chart Families (36 battle-tested scripts)

Family	Code	Scripts	Types
Comparison	`cmp_`	4	Grouped bar, lollipop, violin, dumbbell
Composition	`com_`	4	Stacked bar, treemap, waffle, alluvial
Correlation	`cor_`	4	Scatter+trend, labeled scatter, matrix, bubble
Distribution	`dst_`	7	Histogram, density, ridgeline, boxplot, raincloud, ECDF
Geospatial	`geo_`	4	Choropleth, bubble map, hex tile, faceted
Network	`net_`	3	Force-directed, tree, circular
Statistical	`sta_`	4	Forest plot, PCA biplot, regression diagnostics, QQ
Survival	`sur_`	2	Kaplan-Meier, KM+risk table
Time Series	`ts_`	4	Multiline, stacked area, line+ribbon, dual facet

See topics/brand-token-chart-families for the full mapping of families to brand tokens.

Brand Token System (4 templates x 35+ tokens)

Each brand template defines typography, surfaces, palettes (qualitative, sequential, diverging), data-viz tokens, spacing, and semantic colors:

Slate: Helvetica / Paul Tol Bright. Clean corporate default.
Aurora: Avenir / Paul Tol Vibrant. High-energy, saturated.
Earth: Georgia / Paul Tol Muted. Academic, subdued.
Journal: Palatino / Tableau-10. Publication-ready, classic serif.

All palettes are colorblind-safe by construction (Paul Tol or Tableau lineage).

Quality Validation Pipeline (108 checks)

See topics/r-viz-quality-validation-pipeline for the full validation architecture.

12 automated mechanical checks (regex on script content)
12 family-specific checks x 9 families = 108 total
Rubric scoring: BAD/OKAY/GOOD/EXCELLENT with specific criteria per family
Current baseline: 36/36 scripts pass, 0 errors, 37 warnings

MCP Server (deployed on Railway)

Part of the topics/mcp-server-ecosystem. Exposes:

Resources: plan sections, skill metadata, prompt templates, example scripts, validation rules
Tools: get_plan_section, search_plan, list_chart_families, get_quality_checklist
Prompts: 9 family-specific generators (create_scatter, create_distribution, create_timeseries, etc.)

Provenance

Born from the brand token system experiment (Feb 2026, predates experiments dimension): the insight that visualization quality is reproducible when you externalize design decisions into tokens and validate mechanically. The 1800-line plan emerged from iterating on the 36 scripts until all 108 checks passed: the spec is a distillation of what worked, not a theoretical document.

Key milestones:

36 scripts across 9 families: each script is self-contained, tested, and brand-aware
108-check validation pipeline: automated quality gate that catches the most common visualization failures (missing labels, colorblind-unsafe palettes, poor aspect ratios, overplotting)
4 brand templates: production-tested token sets that map to real publication contexts
MCP deployment: server on Railway exposes the full pipeline programmatically

Usage Notes

Start with list_chart_families to select the right family for your data shape
Always specify a brand template: the default (Slate) is safe but generic
The quality target should be EXCELLENT (8-10) for anything going into a report or publication
Distribution family (dst_) has the most scripts (7) because distributions are the most common visualization need and the most commonly botched
Survival family (sur_) has only 2 scripts but they are the most complex (KM+risk table is ~200 lines)
The validation pipeline catches issues that look fine on screen but fail in print (DPI, font embedding, color contrast)
Geospatial scripts require additional system dependencies (GDAL/PROJ): check geo_ prerequisites before running

Quality Checks

Chart renders without ggplot errors. Rscript <chart>.R 2>&1 | grep -c 'Error\|Warning' returns 0.
Export DPI correct. Print: 300 DPI. Web: 144 DPI. Verify with exiftool <chart>.png | grep -i dpi.
Fonts embed in PDF export. pdffonts <chart>.pdf shows no Type 3 fonts (= rasterized); all fonts embedded.
Family code matches spec. DST (distribution), COR (correlation), TS (time-series), CMP (comparison), COM (composition). Per [topics/visual-output-routing](/topics/visual-output-routing).
No overplotting. For scatter plots with n>100, alpha-blending or jittering present. Verify visually or via density-overlay check.
Axes labeled, title + caption present. Every chart has ggtitle, labs(x=, y=), and a source-note caption.

Visual Enrichment

Medium	Type	Description
R	`STA` forest plot	108-check pass rates by family
Figma	Flowchart	Pipeline: data -> family -> script -> brand -> validation

Self-Improvement Cross-Reference

This skill is the R-side implementation of the routing described in topics/visual-output-routing. It is both a tool and a product of Pattern 4 (Compiler Wiki): the quality validator is a lint cycle. For the master reference on all 6 self-improvement patterns, see skills/self-improving-agent-patterns.