Experiment Personality email-voice

Email voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

personalityvoice-distillationgmailpublic-lab
Hypothesis

Email voice is distinct from social-media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

Result: pending

Gmail Voice Distillation

Second personality experiment for The Public Lab. Extracts and measures personal email brand voice from Gmail sent folder: the email complement to the Twitter voice profile (brand-voice).

Hypothesis

Email voice is distinct from social media voice. The same person writes differently in email (longer, more structured, context-dependent formality shifts by recipient type). Measuring these dimensions creates a usable profile for AI-assisted email drafting that preserves authentic voice.

Method

Mirror the brand-voice pipeline architecture:

  1. Extract: Pull all sent emails via Gmail API (OAuth tokens from jobs-apply)
  2. Corpus: Parse MIME, strip HTML/quoted content, structure into clean corpus
  3. Measure: 12 deterministic dimensions (no LLM): length, greeting/closing patterns, sentence structure, punctuation, formality, questions, emoji, structure, response ratio, temporal, paragraph rhythm
  4. Classify: LLM-assisted recipient segmentation (6 categories)
  5. Profile: Synthesize email-voice-profile.json with per-segment voice variations

Key Innovation

Per-segment voice variations: measuring how voice shifts between professional-internal, professional-external, cold-outreach, follow-up, personal, and transactional contexts. Twitter voice is (mostly) one register; email voice is multi-register.

Metrics

  • Corpus size (emails extracted, post-filtering)
  • Dimension coverage (12/12 non-empty)
  • Segment distribution (all 6 segments populated)
  • Profile completeness (all rule fields filled)

Results

To be filled after pipeline run.

Connection to Brand Voice

AspectTwitter (brand-voice)Email (email-voice)
Corpus12,459 tweets -> 1,149 filteredTBD sent emails
Archetype”The Dry Observer”TBD
RegisterSingle (casual-authoritative)Multi (varies by segment)
Key dimensionBrevity (median 86 chars)TBD
Anti-patternsLinkedIn voice, long threadsTBD