Breakthrough Memory bloomnet

x-sync CIB Detection: 8-algorithm ensemble, passive capture, knowledge graph accumulation

Manual investigation model: launch Chrome, create investigation, browse, export. Naive heuristics (FFR < 0.1, tweet density > 50/day). No cross-batch accumulation. -> Always-on passive capture of ALL GraphQL responses. 8-algorithm ensemble (SGBot, entropy, near-dup, temporal burst, OSoMe coordination, Benford's Law, Louvain community detection, Lysis convergence). Cross-batch knowledge graph with cluster evolution tracking. 227KB bundle, <376ms per-batch analysis.

breakthroughbot-detectionknowledge-graphsocial-media
Key Metric
Before
Manual investigation model: launch Chrome, create investi...
After
Always-on passive capture of ALL GraphQL responses. 8-alg...

What Changed

x-sync went from a manual investigation model (create investigation, browse in special Chrome session, export) to an always-on passive CIB detection system. Every GraphQL response during normal X/Twitter browsing is captured, and every 200 tweets triggers automatic analysis through an 8-algorithm ensemble. Results accumulate into a persistent knowledge graph of accounts, coordination edges, and community clusters that grows over time.

Key Metrics

MetricBeforeAfter
Capture modelManual investigation gatingAlways-on passive (all GraphQL responses)
Detection algorithmsNaive heuristics (3)Academic ensemble (8)
Per-batch analysisNoneSGBot + entropy + near-dup + temporal burst + coordination
Cross-batch analysisNoneLouvain clustering + Benford’s Law + Lysis rescore (8h alarm)
Knowledge graphNoneAccounts + edges + clusters with evolution tracking
Bundle size128KB227KB (+99KB for graphology)
Batch analysis timeN/A<376ms budget per 200-tweet batch

Architecture

Three-phase pipeline implemented across 10 new/modified source files:

Phase 1: Capture Layer
  interceptor captures ALL GraphQL responses during normal browsing
  capture_buffer stores tweets + accounts + interaction edges
  batch trigger fires at every 200 unanalyzed tweets

Phase 2: Per-Batch Analysis (runs in SW, <376ms)
  A1: SGBot 17-dimensional profile scoring (Yang et al. 2020, F1=0.86)
  A2: Shannon entropy on inter-tweet intervals (Chavoshi et al. 2016, F1=0.96)
  A3: Near-duplicate text via Dice coefficient >= 0.8 with 3-gram shingling
  A4: Poisson temporal burst detection (5-min windows, p < 0.01)
  A5: OSoMe TF-IDF cosine coordination (Luceri et al. 2024)
  A8: Lysis convergence ensemble (multi-signal fusion with convergence bonus)

Phase 3: Cross-Batch Knowledge Graph (8-hour periodic alarm)
  A7: Louvain community detection via graphology (Blondel et al. 2008)
  A6: Benford's Law on follower/following counts (Golbeck 2019, 2 FP / 21K)
  Cluster evolution tracking via Jaccard overlap >= 0.5
  Network-aware Lysis rescore for all cluster members
  Dormant cluster detection (90-day threshold, never auto-deleted)

Algorithms (Academic Sources)

AlgorithmSourceSignalF1/Accuracy
SGBot profile vectorYang et al. 2020Bot-farm account patterns from metadata0.86 F1
Inter-tweet entropyChavoshi et al. 2016Periodic posting patterns (low Shannon entropy)0.96 F1
Near-duplicate textDice coefficientCopy-paste campaigns across accounts-
Poisson temporal burstStatistical testStatistically impossible posting synchrony-
OSoMe coordinationLuceri et al. 2024Shared behavior TF-IDF cosine similarityfield-validated
Benford’s LawGolbeck 2019Fake follower count distributions2 FP / 21,135
Louvain communitiesBlondel et al. 2008Dense subgraph detection-
Lysis convergenceLysis Project 2024Multi-signal fusion with convergence bonusfield-validated

Key Design Decisions

  • Per-batch vs periodic split: Coordination detection (A5) runs per-batch because it uses existing edges. Louvain (A7) + Benford (A6) run on an 8-hour alarm because they need the full graph and are O(n log n).
  • Self-reinforcing loop prevention: Coordination edge query filters out edgeType === 'coordination' so derivative signals don’t inflate cosine scores.
  • Edge-type isolation: All edge upserts filter by edgeType to prevent interaction edges (retweet/reply/quote/mention) from merging with coordination edges.
  • Bulk-fetch optimization: Graph-runner rescoring uses 3 bulk queries instead of O(3N) per-account queries to stay within MV3 SW timeout.
  • No auto-pruning: Per research pipeline memory, pruning is manual-only (investigation.mjs prune --confirm) to prevent unauthorized data loss.
  • Cluster evolution: Jaccard overlap >= 0.5 matching (not exact member sets) handles natural membership drift across graph analysis runs.

Files

FilePurpose
src/lib/analysis/sgbot.tsA1: 17-dim profile scoring with bigram name model
src/lib/analysis/entropy.tsA2: Shannon entropy with cross-batch histogram accumulation
src/lib/analysis/near-dup.tsA3: Dice coefficient with 3-gram shingling
src/lib/analysis/temporal.tsA4: Poisson temporal burst detection
src/lib/analysis/coordination.tsA5: TF-IDF cosine coordination detection
src/lib/analysis/benford.tsA6: Benford’s Law (chi-squared + Pearson R)
src/lib/analysis/louvain.tsA7: Louvain community detection via graphology
src/lib/analysis/lysis.tsA8: Convergence ensemble scorer
src/lib/analysis/batch-runner.tsPer-batch orchestrator (A1-A5, A8)
src/lib/analysis/graph-runner.tsPeriodic graph orchestrator (A6-A7, rescore)
  • Plan: packages/x-sync/docs/superpowers/plans/2026-05-02-x-sync-bot-detection-pipeline.md
  • Pitfall: ~/vault/topics/pitfalls/agent-unauthorized-db-wipe.md
  • Pitfall: ~/vault/topics/pitfalls/wxt-mv3-sw-cache-blocks-iteration.md
  • Memory: feedback_x_sync_is_research_pipeline.md