Assumed knowledge
- Dense vs. sparse retrieval
- Dual-encoder (“two-tower”) = query & document encoded separately into single vectors. Eg. Dense Passage Retrieval (DPR) – Karpukhin et al. EMNLP 2020 (aclanthology.org).
- Sparse, term-based retrieval = BM25 and its probabilistic relevance model. Intro tutorial: “What is BM25…” GeeksforGeeks, Jul 23 2025 (geeksforgeeks.org).
- Approximate nearest neighbor (ANN) search
- Graph‐based (HNSW), clustering (IVF-PQ), quantization. See Faiss v1.11.0 release notes (github.com).
- Late interaction retrieval
- Multi-vector representations allowing token-level interaction at search time. See ColBERT (SIGIR 2020) (arxiv.org).
- Retrieval-augmented generation (RAG)
- Infrastructure at scale
- Disk-based vector indexes (DiskANN) vs. in-memory (HNSW). DiskANN in SQL Server 2025 public preview May 19 2025 (techcommunity.microsoft.com).
Areas still evolving (tracked in DeltaDrops):
- Late interaction efficiency & compression (ColBERTv2 → PLAID → SPLATE).
- Learned sparse retrievers (SPLADE-v3, Mistral-SPLADE).
- Generative retrieval & hybrid models (Atlas, LIGER, GeAR).
- Vector infrastructure: on-disk indexes, quantization, continuous indexing, scale to trillions.
- Hybrid sparse‐dense and multi-stage pipelines (BM25 → dense → cross-encoder).
What to know
- Traditional two-tower (dual encoder) dense retrieval (e.g. DPR) excels in speed but can lose fine-grained token interactions at scale (aclanthology.org, arxiv.org).
- Late interaction (ColBERT) stores per-token embeddings and computes MaxSim at query time for richer matching. ColBERTv2 applies aggressive residual compression & denoised supervision to cut footprint 6–10× while improving quality (arxiv.org).
- PLAID engine speeds ColBERTv2 up to 7× on GPU and 45× on CPU with centroid interaction/pruning, preserving SOTA accuracy at tens-of-ms latency on 140 M passages (arxiv.org).
- SPLATE adapts ColBERTv2 for CPU by mapping token embeddings to a sparse vocabulary (via SPLADE), enabling <10 ms candidate generation and matching PLAID’s effectiveness (arxiv.org).
- Learned sparse retrievers like SPLADE-v3 push SPLADE to >40 MRR@10 on MS MARCO and +2% BEIR out-of-domain (arxiv.org). Mistral-SPLADE uses a decoder-only LLM backbone to further improve BEIR performance, now SOTA among sparse retrievers (arxiv.org).
- Generative retrieval (sequence-to-sequence models as retrievers) and RAG: Atlas demonstrates that a retrieval-augmented T5 with Fusion-in-Decoder outperforms a 540B-parameter model on Natural Questions few-shot, achieving 42% accuracy with 64 examples (jmlr.org, arxiv.org).
- Hybrid generative–dense retrieval: LIGER combines a generative candidate set with dense re-ranking to improve cold-start recall in recommendation benchmarks (Amazon Beauty, Steam), narrowing gap with dense-only methods (reddit.com).
- Graph-enhanced RAG (GeAR) uses graph expansion around retrieved docs to boost multi-hop QA, improving MuSiQue performance by >10% and reducing token/iteration count (arxiv.org).
- On-disk vector indexes: DiskANN integrates into SQL Server 2025 (public preview May 19 2025) and Azure Database for PostgreSQL (GA May 19 2025), delivering 10× faster queries and up to 96× lower memory vs. HNSW‐pgvector (techcommunity.microsoft.com).
- Core libraries & frameworks:
- Faiss (v1.11.0) continues to add RaBitQ, HNSW improvements, sharding, GPU support for quantized indexes (github.com).
- Hugging Face transformers support DPR, ColBERTv2, SPLADE.
- Open-source engines: Vespa, Weaviate, Milvus, Qdrant.
Starter sources
Late interaction & multi-vector
- ColBERT: “Efficient and Effective Passage Search via Contextualized Late Interaction over BERT” (SIGIR ’20) (arxiv.org)
- ColBERTv2: “Effective and Efficient Retrieval via Lightweight Late Interaction” (NAACL ’22) (aclanthology.org)
- PLAID: “An Efficient Engine for Late Interaction Retrieval” (CIKM ’22) (arxiv.org)
- SPLATE: “Sparse Late Interaction Retrieval” (SIGIR ’24) (arxiv.org)
Learned sparse retrieval
- SPLADE-v3: “New baselines for SPLADE” (arXiv Mar 2024) (arxiv.org)
- Mistral-SPLADE: “LLMs for better Learned Sparse Retrieval” (arXiv Aug 2024) (arxiv.org)
- Adapter-based SPLADE: Pal et al., “Parameter-Efficient Sparse Retrievers…” (arXiv Mar 2023) (arxiv.org)
Dense & two-tower
- DPR: “Dense Passage Retrieval for Open-Domain QA” (EMNLP ’20) (aclanthology.org, arxiv.org)
- Faiss: GitHub, “facebookresearch/faiss” (v1.11.0 changelog) (github.com)
Generative & hybrid retrieval
- Atlas: Izacard et al., “Few-shot Learning with Retrieval-Augmented Language Models” (JMLR 2023; arXiv Aug 2022) (jmlr.org, arxiv.org)
- LIGER: Meta AI, “LeveragIng dense retrieval for GEnerative Retrieval” (arXiv Nov 2024; summary on Reddit) (reddit.com)
- GeAR: “Graph-enhanced Agent for RAG” (arXiv Dec 2024) (arxiv.org)
- Survey: “The Survey of Retrieval-Augmented Text Generation…” (arXiv Apr 2024) (arxiv.org)
Infrastructure & indexing
- DiskANN in SQL Server 2025 (public preview May 19 2025) & Azure PostgreSQL GA May 19 2025 (techcommunity.microsoft.com)
- Vector DB quantization & SSD use cases: KIOXIA blog (PCIe 5.0 SSDs & DiskANN) (blog-us.kioxia.com)
- Vector DB integration: Azure Cosmos DB + DiskANN (Jun 2024) (techcommunity.microsoft.com)
Key people & orgs
- Matei Zaharia, Omar Khattab, Christopher Potts (ColBERT family)
- Patrick Lewis, Danqi Chen, Wen-tau Yih (DPR)
- Stéphane Clinchant, Hervé Déjean, Thibault Formal (SPLADE)
- Gautier Izacard, Sebastian Riedel (Atlas)
- Microsoft Research (DiskANN), Meta AI, Fujitsu Research
Tools & libraries
- Faiss (CPU/GPU, quantization, sharding)
- DiskANN (SQL Server 2025, Azure PostgreSQL)
- DPR & ColBERT implementations on Hugging Face
- SPLADE via 🤗 transformers and official library
- qdrant, Milvus, Vespa, Weaviate for end-to-end retrieval systems
- rank_bm25 for quick BM25 prototyping (PyPI) (geeksforgeeks.org)
This baseline equips you to dive deeper into retrieval innovations—expect regular DeltaDrops on late interaction advances, sparse/dense hybrids, generative retrieval trends, and exploding vector-search infrastructure.