Retrieval at Scale | Drop for 2026-02-19

TL;DR

Since Feb 11, 2026, notable retrieval updates: (1) Milvus 2.6.11 improves hybrid filtering, loading, and Storage V2 pipelining; (2) Vespa’s Feb 2026 newsletter ships faster CBOR result serialization, Pyvespa 1.0 throughput gains, a Kubernetes Operator, and GCP Marketplace availability; (3) OpenSearch 3.5 (Feb 10) delivers up to 58% vector-search throughput gains via FP16 SIMD and expands agentic capabilities; (4) Lucene 10.4.0 advanced to RC2 with a re‑spin to fix a sorted‑query regression—10.4 is imminent; (5) new research (KHI) proposes a multi‑attribute filtered‑ANN index that reports 2.5–16× QPS gains at high recall.

Milvus 2.6.11: faster filtering, loading, and on‑disk I/O pipelining

Key facts and current state of the topic
- Milvus is widely used as a first‑stage ANN engine in hybrid pipelines; correctness and tail‑latency hinge on filtering and segment loads. (milvus.io)
Important context and background information
- Recent 2.6.x updates added FP16/BF16 auto‑conversion and hybrid controls; 2.6.11 extends this with Storage V2 and filtering optimizations. (milvus.io)
Recent developments or changes
- Feb 12: 2.6.11 improves filtering execution (less CPU), adds sparse‑filter support in search, speeds segment loading (load‑diff patches), enables true I/O pipelining via LoadWithStrategyAsync, and adds warmup controls; also fixes WAL recovery and index‑build edge cases. Consider upgrading for steadier p95/p99 under hybrid re‑ranking. (milvus.io)

Vespa February 2026: faster result delivery, Pyvespa 1.0, and easier ops

Key facts and current state of the topic
- Vespa underpins large hybrid and multi‑stage ranking systems; response serialization can dominate latency when returning many chunks to LLMs or late‑interaction rerankers. (blog.vespa.ai)
Important context and background information
- Moving from JSON to a compact binary format and improving client stacks reduces end‑to‑end latency without changing retrieval logic. (blog.vespa.ai)
Recent developments or changes
- Feb 16 newsletter: JSON response generation is >2× faster; new CBOR result format yields smaller, quicker payloads; Pyvespa 1.0 switches HTTP client and leverages CBOR for ~4.9× lower latency in a tensor‑heavy benchmark; plus a Kubernetes Operator and GCP Marketplace listing to ease deployments. (blog.vespa.ai)

OpenSearch 3.5 GA: vector‑search throughput + agentic memory

Key facts and current state of the topic
- OpenSearch is broadly deployed for vector + lexical hybrids; 3.5 emphasizes performance and agentic capabilities. (opensearch.org)
Important context and background information
- SIMD and mixed‑precision support can raise candidate budgets at fixed SLAs—useful for ads retrieval with filters/rerankers. (docs.opensearch.org)
Recent developments or changes
- Released Feb 10: documents call out up to 58% vector‑search throughput improvement via bulk SIMD FP16 operations and new “agentic conversation memory” for AI apps; review release notes before planning upgrades in managed environments. (docs.opensearch.org)

Lucene 10.4.0: RC2 re‑spin signals near‑term release

Key facts and current state of the topic
- Lucene improvements flow into Elasticsearch/OpenSearch and Lucene‑based stacks powering candidate generation. (lucene.apache.org)
Important context and background information
- A regression discovered during voting (sorted queries with SortedSetDocValues and skippers) prompted backports, indicating active performance hardening. (mail-archive.com)
Recent developments or changes
- Feb 12: RC2 re‑spun after landing the fix and backports to 10.x/10.4 branches; watch for 10.4 GA and measure effects on lexical + kNN pipelines with filters. (mail-archive.com)

Research: multi‑attribute filtered‑ANN (KHI) boosts QPS at high recall

Key facts and current state of the topic
- Filtered ANN often degrades under multi‑attribute range predicates; most systems optimize for single‑attribute filters. (arxiv.org)
Important context and background information
- The proposed KHI index combines attribute‑space partitioning with HNSW per node; a skew‑aware split keeps tree height O(log n) while preserving recall. (arxiv.org)
Recent developments or changes
- Feb 17 preprint reports 2.46× average QPS (up to 16.22× on a hard dataset) at high recall; promising direction for heavy multi‑filter ads/search workloads—compare against ACORN‑style traversal or strict post‑filter baselines. (arxiv.org)