Experiment Personality media-diet

The YouTube Data API combined with the Google Takeout watch-history.json archive can produce a complete vault-indexable corpus of personal video consumption, capturing title, channel, duration, watch timestamp, and category for enough items to measure the video-consumption side of the personality vector.

April 16, 2026

personalitymediaingestpublic-lab

Hypothesis

The YouTube Data API combined with the Google Takeout watch-history.json archive can produce a complete vault-indexable corpus of personal video consumption, capturing title, channel, duration, watch timestamp, and category for enough items to measure the video-consumption side of the personality vector.

Result: pending

Changelog

Date	Summary
2026-04-17	Initial experiment authored. Ingest spec written for Takeout archive plus Data API enrichment. Result pending run.

YouTube is the most complete consumption record any single platform holds on me. Every video clicked, every video finished, every video skipped is logged to the account. The Google Takeout archive exports the full history as watch-history.json. The YouTube Data API v3 fills in metadata (duration, category, channel topic) that the Takeout file omits. Together they should produce a one-frame-per-video corpus with enough fields to classify consumption lanes and compare them against brand-voice production lanes.

Method

Takeout pull (primary for historical coverage). Request a Google Takeout for the YouTube product. The resulting archive contains Takeout/YouTube and YouTube Music/history/watch-history.json, which lists every watch event with title, titleUrl (the video URL), subtitles (channel name and URL), time, and activityControls. Parse into frames at ~/vault/media/youtube/{video_id}.md. Video ID extracted from the canonical https://www.youtube.com/watch?v={id} URL.
Data API enrichment (primary for incremental sync). Register a Google Cloud project with the YouTube Data API v3 enabled. Batch-fetch videos.list?part=snippet,contentDetails,topicDetails&id={id} up to 50 ids per call. Append the returned categoryId, duration (ISO 8601), tags, and topicCategories to the corresponding frames.
Transcript pull (optional, on demand). For videos where categoryId maps to a lane of interest (documentary, tech, news), fetch the transcript via timedtext endpoint and store the plain-text transcript in the frame body. Skip music and gaming categories by default to keep corpus size manageable.
Integration. Add an ingest step (media-diet-youtube) to BloomNet. rv index picks up the new directory automatically.

Results

Pending. Will measure:

Corpus size (videos extracted from Takeout, videos enriched via API, transcript fetch hit rate).
Field coverage (percentage of frames with non-empty duration, channel, category).
Runtime (parse, enrich, transcript pull, total end-to-end).
Lane distribution (top categories by watch count, top channels by total watch time).
Delta vs brand-voice production lanes (Geo/OSINT, Humor, Culture, Tech).

Findings

Pending.

Next Steps

If confirmed, wire a weekly refresh that pulls incremental watch history via the Data API and backfills any gaps with the next Takeout export. If refuted (API quota too restrictive, Takeout too sparse), fall back to Takeout-only and document the quota bound explicitly. Either outcome feeds the reconciliation scanner defined in the media-diet project frame, producing the measurable consumption-versus-production delta that is the point of the fourth personality vector.