Experiment Personality media-diet

The Meta Graph API plus a Google Takeout style archive import can produce a complete vault-indexable corpus of personal feed activity (authored posts, reactions, comments, and where available, impressions), sufficient to measure a consumption-side personality vector that complements the brand-voice production vector.

personalitymediaingestpublic-lab
Hypothesis

The Meta Graph API plus a Google Takeout style archive import can produce a complete vault-indexable corpus of personal feed activity (authored posts, reactions, comments, and where available, impressions), sufficient to measure a consumption-side personality vector that complements the brand-voice production vector.

Result: pending

Changelog

DateSummary
2026-04-17Initial experiment authored. Pipeline spec written. Result still pending run.

Hypothesis

Meta holds a decade of my consumption and reaction data. The API exposes a meaningful slice; the archive export covers the rest. If both paths combined can produce a frame-per-item corpus with at least the captured fields we need (timestamp, item type, author, caption, any reaction I left, dwell or duration where exposed), then the result feeds the same ingest pattern that worked for brand-voice and apple-photos. A full run should produce thousands of frames without synthetic data, fully deduplicated, fully searchable.

Method

Two pipelines, one output path.

  1. Archive pull (primary for historical backfill). Request a data export from Facebook and Instagram settings. Both deliver a zipped archive. Parse posts_and_comments/your_posts.json, activity_outside_meta/off-facebook_activity.json, and the Instagram liked_posts.json and saved_posts.json where present. Emit one frame per item into ~/vault/media/meta/{platform}/{item_id}.md with normalized fields: source (facebook or instagram), item_type (post, like, save, comment, share), timestamp, author_handle, caption, media_url, permalink.
  2. Graph API pull (primary for incremental). Register a Meta app, grant a long-lived user access token scoped to user_posts, user_likes, and any additional scopes that App Review approves. Sync via /me/posts, /me/likes, and /me/feed endpoints. Dedupe against frames the archive import already created, keyed by item_id.
  3. OCR and caption extraction. For items where the archive ships a cached image, run OCR via the same pipeline apple-photos uses. Append extracted text to the frame body so items remain searchable by content, not only by metadata.
  4. Integration. Add a new ingest step (media-diet-meta) to the BloomNet ingest pipeline. Wire rv index to pick up the new directory. No new schema; reuse the existing frame type.

Results

Pending. Will measure:

  • Corpus size (items extracted from archive, items extracted from API, overlap).
  • Field coverage (percentage of frames with a non-empty caption, timestamp, author).
  • Runtime (archive parse, API sync, total end-to-end).
  • Dedup quality (frames that would have been duplicated across archive and API).
  • Lane overlap against brand-voice production lanes (Geo/OSINT, Humor, Culture, Tech).

Findings

Pending.

Next Steps

If confirmed (both paths produce a joined, deduped, lane-classifiable corpus), integrate the reconciliation scanner that compares consumption lanes to production lanes. If refuted (API too restrictive, archive too sparse, dedup unreliable), fall back to archive-only and document which fields are permanently missing. Either outcome feeds the same downstream: a measurable delta between what I consume and what I produce, published as the fourth personality vector.