Experiment Personality email-voice

Email voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

April 12, 2026

personalityvoice-distillationgmailpublic-lab

Hypothesis

Email voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

Result: pending

Gmail Voice Distillation

Second personality experiment for The Public Lab. Extracts and measures personal email brand voice from Gmail sent folder: the email complement to the Twitter voice profile (brand-voice).

Hypothesis

Email voice is distinct from social media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

Method

Mirror the brand-voice pipeline architecture:

Extract: Pull all sent emails via Gmail API (OAuth tokens from jobs-apply)
Corpus: Parse MIME, strip HTML/quoted content, structure into clean corpus
Measure: 12 deterministic dimensions (no LLM): length, greeting/closing patterns, sentence structure, punctuation, formality, questions, emoji, structure, response ratio, temporal, paragraph rhythm
Classify: LLM-assisted recipient segmentation (6 categories)
Profile: Synthesize email-voice-profile.json with per-segment voice variations

Key Innovation

Per-segment voice variations: measuring how voice shifts between professional-internal, professional-external, cold-outreach, follow-up, personal, and transactional contexts. Twitter voice is (mostly) one register; email voice is multi-register.

Metrics

Corpus size (emails extracted, post-filtering)
Dimension coverage (12/12 non-empty)
Segment distribution (all 6 segments populated)
Profile completeness (all rule fields filled)

Results

To be filled after pipeline run.

Connection to Brand Voice

Aspect	Twitter (brand-voice)	Email (email-voice)
Corpus	12,459 tweets -> 1,149 filtered	TBD sent emails
Archetype	”The Dry Observer”	TBD
Register	Single (casual-authoritative)	Multi (varies by segment)
Key dimension	Brevity (median 86 chars)	TBD
Anti-patterns	LinkedIn voice, long threads	TBD