The YouTube Data API combined with the Google Takeout watch-history.json archive can produce a complete vault-indexable corpus of personal video consumption, capturing title, channel, duration, watch timestamp, and category for enough items to measure the video-consumption side of the personality vector.
HypothesisThe YouTube Data API combined with the Google Takeout watch-history.json archive can produce a complete vault-indexable corpus of personal video consumption, capturing title, channel, duration, watch timestamp, and category for enough items to measure the video-consumption side of the personality vector.

Changelog
| Date | Summary |
|---|---|
| 2026-04-17 | Initial experiment authored. Ingest spec written for Takeout archive plus Data API enrichment. Result pending run. |
Hypothesis
YouTube is the most complete consumption record any single platform holds on me. Every video clicked, every video finished, every video skipped is logged to the account. The Google Takeout archive exports the full history as watch-history.json. The YouTube Data API v3 fills in metadata (duration, category, channel topic) that the Takeout file omits. Together they should produce a one-frame-per-video corpus with enough fields to classify consumption lanes and compare them against brand-voice production lanes.
Method
- Takeout pull (primary for historical coverage). Request a Google Takeout for the YouTube product. The resulting archive contains
Takeout/YouTube and YouTube Music/history/watch-history.json, which lists every watch event withtitle,titleUrl(the video URL),subtitles(channel name and URL),time, andactivityControls. Parse into frames at~/vault/media/youtube/{video_id}.md. Video ID extracted from the canonicalhttps://www.youtube.com/watch?v={id}URL. - Data API enrichment (primary for incremental sync). Register a Google Cloud project with the YouTube Data API v3 enabled. Batch-fetch
videos.list?part=snippet,contentDetails,topicDetails&id={id}up to 50 ids per call. Append the returnedcategoryId,duration(ISO 8601),tags, andtopicCategoriesto the corresponding frames. - Transcript pull (optional, on demand). For videos where
categoryIdmaps to a lane of interest (documentary, tech, news), fetch the transcript viatimedtextendpoint and store the plain-text transcript in the frame body. Skip music and gaming categories by default to keep corpus size manageable. - Integration. Add an ingest step (
media-diet-youtube) to BloomNet.rv indexpicks up the new directory automatically.
Results
Pending. Will measure:
- Corpus size (videos extracted from Takeout, videos enriched via API, transcript fetch hit rate).
- Field coverage (percentage of frames with non-empty duration, channel, category).
- Runtime (parse, enrich, transcript pull, total end-to-end).
- Lane distribution (top categories by watch count, top channels by total watch time).
- Delta vs brand-voice production lanes (Geo/OSINT, Humor, Culture, Tech).
Findings
Pending.
Next Steps
If confirmed, wire a weekly refresh that pulls incremental watch history via the Data API and backfills any gaps with the next Takeout export. If refuted (API quota too restrictive, Takeout too sparse), fall back to Takeout-only and document the quota bound explicitly. Either outcome feeds the reconciliation scanner defined in the media-diet project frame, producing the measurable consumption-versus-production delta that is the point of the fourth personality vector.