The Meta Graph API plus a Google Takeout style archive import can produce a complete vault-indexable corpus of personal feed activity (authored posts, reactions, comments, and where available, impressions), sufficient to measure a consumption-side personality vector that complements the brand-voice production vector.
HypothesisThe Meta Graph API plus a Google Takeout style archive import can produce a complete vault-indexable corpus of personal feed activity (authored posts, reactions, comments, and where available, impressions), sufficient to measure a consumption-side personality vector that complements the brand-voice production vector.

Changelog
| Date | Summary |
|---|---|
| 2026-04-17 | Initial experiment authored. Pipeline spec written. Result still pending run. |
Hypothesis
Meta holds a decade of my consumption and reaction data. The API exposes a meaningful slice; the archive export covers the rest. If both paths combined can produce a frame-per-item corpus with at least the captured fields we need (timestamp, item type, author, caption, any reaction I left, dwell or duration where exposed), then the result feeds the same ingest pattern that worked for brand-voice and apple-photos. A full run should produce thousands of frames without synthetic data, fully deduplicated, fully searchable.
Method
Two pipelines, one output path.
- Archive pull (primary for historical backfill). Request a data export from Facebook and Instagram settings. Both deliver a zipped archive. Parse
posts_and_comments/your_posts.json,activity_outside_meta/off-facebook_activity.json, and the Instagramliked_posts.jsonandsaved_posts.jsonwhere present. Emit one frame per item into~/vault/media/meta/{platform}/{item_id}.mdwith normalized fields:source(facebook or instagram),item_type(post, like, save, comment, share),timestamp,author_handle,caption,media_url,permalink. - Graph API pull (primary for incremental). Register a Meta app, grant a long-lived user access token scoped to
user_posts,user_likes, and any additional scopes that App Review approves. Sync via/me/posts,/me/likes, and/me/feedendpoints. Dedupe against frames the archive import already created, keyed byitem_id. - OCR and caption extraction. For items where the archive ships a cached image, run OCR via the same pipeline apple-photos uses. Append extracted text to the frame body so items remain searchable by content, not only by metadata.
- Integration. Add a new ingest step (
media-diet-meta) to the BloomNet ingest pipeline. Wirerv indexto pick up the new directory. No new schema; reuse the existing frame type.
Results
Pending. Will measure:
- Corpus size (items extracted from archive, items extracted from API, overlap).
- Field coverage (percentage of frames with a non-empty caption, timestamp, author).
- Runtime (archive parse, API sync, total end-to-end).
- Dedup quality (frames that would have been duplicated across archive and API).
- Lane overlap against brand-voice production lanes (Geo/OSINT, Humor, Culture, Tech).
Findings
Pending.
Next Steps
If confirmed (both paths produce a joined, deduped, lane-classifiable corpus), integrate the reconciliation scanner that compares consumption lanes to production lanes. If refuted (API too restrictive, archive too sparse, dedup unreliable), fall back to archive-only and document which fields are permanently missing. Either outcome feeds the same downstream: a measurable delta between what I consume and what I produce, published as the fourth personality vector.