x-sync CIB Detection: 8-algorithm ensemble, passive capture, knowledge graph accumulation
Manual investigation model: launch Chrome, create investigation, browse, export. Naive heuristics (FFR < 0.1, tweet density > 50/day). No cross-batch accumulation. -> Always-on passive capture of ALL GraphQL responses. 8-algorithm ensemble (SGBot, entropy, near-dup, temporal burst, OSoMe coordination, Benford's Law, Louvain community detection, Lysis convergence). Cross-batch knowledge graph with cluster evolution tracking. 227KB bundle, <376ms per-batch analysis.
What Changed
x-sync went from a manual investigation model (create investigation, browse in special Chrome session, export) to an always-on passive CIB detection system. Every GraphQL response during normal X/Twitter browsing is captured, and every 200 tweets triggers automatic analysis through an 8-algorithm ensemble. Results accumulate into a persistent knowledge graph of accounts, coordination edges, and community clusters that grows over time.
Key Metrics
| Metric | Before | After |
|---|---|---|
| Capture model | Manual investigation gating | Always-on passive (all GraphQL responses) |
| Detection algorithms | Naive heuristics (3) | Academic ensemble (8) |
| Per-batch analysis | None | SGBot + entropy + near-dup + temporal burst + coordination |
| Cross-batch analysis | None | Louvain clustering + Benford’s Law + Lysis rescore (8h alarm) |
| Knowledge graph | None | Accounts + edges + clusters with evolution tracking |
| Bundle size | 128KB | 227KB (+99KB for graphology) |
| Batch analysis time | N/A | <376ms budget per 200-tweet batch |
Architecture
Three-phase pipeline implemented across 10 new/modified source files:
Phase 1: Capture Layer
interceptor captures ALL GraphQL responses during normal browsing
capture_buffer stores tweets + accounts + interaction edges
batch trigger fires at every 200 unanalyzed tweets
Phase 2: Per-Batch Analysis (runs in SW, <376ms)
A1: SGBot 17-dimensional profile scoring (Yang et al. 2020, F1=0.86)
A2: Shannon entropy on inter-tweet intervals (Chavoshi et al. 2016, F1=0.96)
A3: Near-duplicate text via Dice coefficient >= 0.8 with 3-gram shingling
A4: Poisson temporal burst detection (5-min windows, p < 0.01)
A5: OSoMe TF-IDF cosine coordination (Luceri et al. 2024)
A8: Lysis convergence ensemble (multi-signal fusion with convergence bonus)
Phase 3: Cross-Batch Knowledge Graph (8-hour periodic alarm)
A7: Louvain community detection via graphology (Blondel et al. 2008)
A6: Benford's Law on follower/following counts (Golbeck 2019, 2 FP / 21K)
Cluster evolution tracking via Jaccard overlap >= 0.5
Network-aware Lysis rescore for all cluster members
Dormant cluster detection (90-day threshold, never auto-deleted)
Algorithms (Academic Sources)
| Algorithm | Source | Signal | F1/Accuracy |
|---|---|---|---|
| SGBot profile vector | Yang et al. 2020 | Bot-farm account patterns from metadata | 0.86 F1 |
| Inter-tweet entropy | Chavoshi et al. 2016 | Periodic posting patterns (low Shannon entropy) | 0.96 F1 |
| Near-duplicate text | Dice coefficient | Copy-paste campaigns across accounts | - |
| Poisson temporal burst | Statistical test | Statistically impossible posting synchrony | - |
| OSoMe coordination | Luceri et al. 2024 | Shared behavior TF-IDF cosine similarity | field-validated |
| Benford’s Law | Golbeck 2019 | Fake follower count distributions | 2 FP / 21,135 |
| Louvain communities | Blondel et al. 2008 | Dense subgraph detection | - |
| Lysis convergence | Lysis Project 2024 | Multi-signal fusion with convergence bonus | field-validated |
Key Design Decisions
- Per-batch vs periodic split: Coordination detection (A5) runs per-batch because it uses existing edges. Louvain (A7) + Benford (A6) run on an 8-hour alarm because they need the full graph and are O(n log n).
- Self-reinforcing loop prevention: Coordination edge query filters out
edgeType === 'coordination'so derivative signals don’t inflate cosine scores. - Edge-type isolation: All edge upserts filter by
edgeTypeto prevent interaction edges (retweet/reply/quote/mention) from merging with coordination edges. - Bulk-fetch optimization: Graph-runner rescoring uses 3 bulk queries instead of O(3N) per-account queries to stay within MV3 SW timeout.
- No auto-pruning: Per research pipeline memory, pruning is manual-only (
investigation.mjs prune --confirm) to prevent unauthorized data loss. - Cluster evolution: Jaccard overlap >= 0.5 matching (not exact member sets) handles natural membership drift across graph analysis runs.
Files
| File | Purpose |
|---|---|
src/lib/analysis/sgbot.ts | A1: 17-dim profile scoring with bigram name model |
src/lib/analysis/entropy.ts | A2: Shannon entropy with cross-batch histogram accumulation |
src/lib/analysis/near-dup.ts | A3: Dice coefficient with 3-gram shingling |
src/lib/analysis/temporal.ts | A4: Poisson temporal burst detection |
src/lib/analysis/coordination.ts | A5: TF-IDF cosine coordination detection |
src/lib/analysis/benford.ts | A6: Benford’s Law (chi-squared + Pearson R) |
src/lib/analysis/louvain.ts | A7: Louvain community detection via graphology |
src/lib/analysis/lysis.ts | A8: Convergence ensemble scorer |
src/lib/analysis/batch-runner.ts | Per-batch orchestrator (A1-A5, A8) |
src/lib/analysis/graph-runner.ts | Periodic graph orchestrator (A6-A7, rescore) |
Related
- Plan:
packages/x-sync/docs/superpowers/plans/2026-05-02-x-sync-bot-detection-pipeline.md - Pitfall:
~/vault/topics/pitfalls/agent-unauthorized-db-wipe.md - Pitfall:
~/vault/topics/pitfalls/wxt-mv3-sw-cache-blocks-iteration.md - Memory:
feedback_x_sync_is_research_pipeline.md