Hybrid Search
Combining BM25 keyword search with vector similarity search, fused via Reciprocal Rank Fusion (RRF).
Combining BM25 keyword search with vector similarity search, fused via Reciprocal Rank Fusion (RRF).
Hybrid search combines two retrieval methods that fail in complementary ways. BM25 (sparse keyword search) excels at exact term matching but misses synonyms and semantic relationships. Vector similarity search (dense retrieval) captures semantic meaning but misses exact phrases and rare terms. Reciprocal Rank Fusion (RRF) merges the two ranked lists by summing inverse ranks, producing a result that outperforms either method alone.
How It Works
Each query runs against both indexes independently. BM25 returns a ranked list sorted by term frequency and inverse document frequency. Vector search returns a ranked list sorted by cosine or dot-product similarity to the query embedding. RRF scores each document as the sum of 1/(k + rank) across both lists, where k is a constant (typically 60). Documents appearing high in both lists score highest.
Example
The vault knowledge engine (Apr 8-9) uses Tantivy for BM25 and HNSW for vector similarity, fused with RRF. 711 frames indexed on first run. Hybrid search makes the vault accessible to Claude Code sessions via 20 MCP tools: a query that uses a near-synonym (not the exact field name from a pitfall file) still retrieves the right document.
Related
- Embeddings: the dense vector component
- Knowledge Graph
- Model Context Protocol
- Token