Email voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.
HypothesisEmail voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

Gmail Voice Distillation
Second personality experiment for The Public Lab. Extracts and measures personal email brand voice from Gmail sent folder: the email complement to the Twitter voice profile (brand-voice).
Hypothesis
Email voice is distinct from social media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.
Method
Mirror the brand-voice pipeline architecture:
- Extract: Pull all sent emails via Gmail API (OAuth tokens from jobs-apply)
- Corpus: Parse MIME, strip HTML/quoted content, structure into clean corpus
- Measure: 12 deterministic dimensions (no LLM): length, greeting/closing patterns, sentence structure, punctuation, formality, questions, emoji, structure, response ratio, temporal, paragraph rhythm
- Classify: LLM-assisted recipient segmentation (6 categories)
- Profile: Synthesize email-voice-profile.json with per-segment voice variations
Key Innovation
Per-segment voice variations: measuring how voice shifts between professional-internal, professional-external, cold-outreach, follow-up, personal, and transactional contexts. Twitter voice is (mostly) one register; email voice is multi-register.
Metrics
- Corpus size (emails extracted, post-filtering)
- Dimension coverage (12/12 non-empty)
- Segment distribution (all 6 segments populated)
- Profile completeness (all rule fields filled)
Results
To be filled after pipeline run.
Connection to Brand Voice
| Aspect | Twitter (brand-voice) | Email (email-voice) |
|---|---|---|
| Corpus | 12,459 tweets -> 1,149 filtered | TBD sent emails |
| Archetype | ”The Dry Observer” | TBD |
| Register | Single (casual-authoritative) | Multi (varies by segment) |
| Key dimension | Brevity (median 86 chars) | TBD |
| Anti-patterns | LinkedIn voice, long threads | TBD |