Breakthrough Memory bloomnet

x-sync CIB Detection: 8-algorithm ensemble, passive capture, knowledge graph accumulation

Manual investigation model: launch Chrome, create investigation, browse, export. Naive heuristics (FFR < 0.1, tweet density > 50/day). No cross-batch accumulation. -> Always-on passive capture of ALL GraphQL responses. 8-algorithm ensemble (SGBot, entropy, near-dup, temporal burst, OSoMe coordination, Benford's Law, Louvain community detection, Lysis convergence). Cross-batch knowledge graph with cluster evolution tracking. 227KB bundle, <376ms per-batch analysis.

May 1, 2026

breakthroughbot-detectionknowledge-graphsocial-media

Key Metric

Before

Manual investigation model: launch Chrome, create investi...

→

↓

After

Always-on passive capture of ALL GraphQL responses. 8-alg...

What Changed

x-sync went from a manual investigation model (create investigation, browse in special Chrome session, export) to an always-on passive CIB detection system. Every GraphQL response during normal X/Twitter browsing is captured, and every 200 tweets triggers automatic analysis through an 8-algorithm ensemble. Results accumulate into a persistent knowledge graph of accounts, coordination edges, and community clusters that grows over time.

Key Metrics

Metric	Before	After
Capture model	Manual investigation gating	Always-on passive (all GraphQL responses)
Detection algorithms	Naive heuristics (3)	Academic ensemble (8)
Per-batch analysis	None	SGBot + entropy + near-dup + temporal burst + coordination
Cross-batch analysis	None	Louvain clustering + Benford’s Law + Lysis rescore (8h alarm)
Knowledge graph	None	Accounts + edges + clusters with evolution tracking
Bundle size	128KB	227KB (+99KB for graphology)
Batch analysis time	N/A	<376ms budget per 200-tweet batch

Architecture

Three-phase pipeline implemented across 10 new/modified source files:

Phase 1: Capture Layer
  interceptor captures ALL GraphQL responses during normal browsing
  capture_buffer stores tweets + accounts + interaction edges
  batch trigger fires at every 200 unanalyzed tweets

Phase 2: Per-Batch Analysis (runs in SW, <376ms)
  A1: SGBot 17-dimensional profile scoring (Yang et al. 2020, F1=0.86)
  A2: Shannon entropy on inter-tweet intervals (Chavoshi et al. 2016, F1=0.96)
  A3: Near-duplicate text via Dice coefficient >= 0.8 with 3-gram shingling
  A4: Poisson temporal burst detection (5-min windows, p < 0.01)
  A5: OSoMe TF-IDF cosine coordination (Luceri et al. 2024)
  A8: Lysis convergence ensemble (multi-signal fusion with convergence bonus)

Phase 3: Cross-Batch Knowledge Graph (8-hour periodic alarm)
  A7: Louvain community detection via graphology (Blondel et al. 2008)
  A6: Benford's Law on follower/following counts (Golbeck 2019, 2 FP / 21K)
  Cluster evolution tracking via Jaccard overlap >= 0.5
  Network-aware Lysis rescore for all cluster members
  Dormant cluster detection (90-day threshold, never auto-deleted)

Algorithms (Academic Sources)

Algorithm	Source	Signal	F1/Accuracy
SGBot profile vector	Yang et al. 2020	Bot-farm account patterns from metadata	0.86 F1
Inter-tweet entropy	Chavoshi et al. 2016	Periodic posting patterns (low Shannon entropy)	0.96 F1
Near-duplicate text	Dice coefficient	Copy-paste campaigns across accounts	-
Poisson temporal burst	Statistical test	Statistically impossible posting synchrony	-
OSoMe coordination	Luceri et al. 2024	Shared behavior TF-IDF cosine similarity	field-validated
Benford’s Law	Golbeck 2019	Fake follower count distributions	2 FP / 21,135
Louvain communities	Blondel et al. 2008	Dense subgraph detection	-
Lysis convergence	Lysis Project 2024	Multi-signal fusion with convergence bonus	field-validated

Key Design Decisions

Per-batch vs periodic split: Coordination detection (A5) runs per-batch because it uses existing edges. Louvain (A7) + Benford (A6) run on an 8-hour alarm because they need the full graph and are O(n log n).
Self-reinforcing loop prevention: Coordination edge query filters out edgeType === 'coordination' so derivative signals don’t inflate cosine scores.
Edge-type isolation: All edge upserts filter by edgeType to prevent interaction edges (retweet/reply/quote/mention) from merging with coordination edges.
Bulk-fetch optimization: Graph-runner rescoring uses 3 bulk queries instead of O(3N) per-account queries to stay within MV3 SW timeout.
No auto-pruning: Per research pipeline memory, pruning is manual-only (investigation.mjs prune --confirm) to prevent unauthorized data loss.
Cluster evolution: Jaccard overlap >= 0.5 matching (not exact member sets) handles natural membership drift across graph analysis runs.

Files

File	Purpose
`src/lib/analysis/sgbot.ts`	A1: 17-dim profile scoring with bigram name model
`src/lib/analysis/entropy.ts`	A2: Shannon entropy with cross-batch histogram accumulation
`src/lib/analysis/near-dup.ts`	A3: Dice coefficient with 3-gram shingling
`src/lib/analysis/temporal.ts`	A4: Poisson temporal burst detection
`src/lib/analysis/coordination.ts`	A5: TF-IDF cosine coordination detection
`src/lib/analysis/benford.ts`	A6: Benford’s Law (chi-squared + Pearson R)
`src/lib/analysis/louvain.ts`	A7: Louvain community detection via graphology
`src/lib/analysis/lysis.ts`	A8: Convergence ensemble scorer
`src/lib/analysis/batch-runner.ts`	Per-batch orchestrator (A1-A5, A8)
`src/lib/analysis/graph-runner.ts`	Periodic graph orchestrator (A6-A7, rescore)

Plan: packages/x-sync/docs/superpowers/plans/2026-05-02-x-sync-bot-detection-pipeline.md
Pitfall: ~/vault/topics/pitfalls/agent-unauthorized-db-wipe.md
Pitfall: ~/vault/topics/pitfalls/wxt-mv3-sw-cache-blocks-iteration.md
Memory: feedback_x_sync_is_research_pipeline.md