TL;DR
- Weaviate 1.38.0 is GA: HFresh (disk‑oriented vector index) graduates to GA, with Namespaces and Nested Object Filtering in preview for multi‑tenant and semi‑structured workloads. (github.com)
- Milvus 2.6.18 (June 5) adds nullable vectors, element‑level search over Struct arrays, HTTP/2 for the proxy, and scheduler/stability improvements—useful for hybrid and constantly updated corpora. (milvus.io)
- FLASH‑MAXSIM (arXiv, May 28) introduces fused GPU kernels for MaxSim in late‑interaction models, reporting up to 3.9× (A100) / 4.7× (H100) speedups and large memory cuts with exact ranking preserved. (arxiv.org)
- Vespa shows classic sparse still moves: a BM25+features recipe lifted MSMARCO MRR@10 by ~10.8% while generalizing better out‑of‑domain. (blog.vespa.ai)
- Azure AI Search adds a “retrieval reasoning effort” knob (preview 2026‑05‑01 API) to trade latency/cost vs. depth in agentic retrieval, including one‑shot iterative refetch for harder queries. (learn.microsoft.com)
Weaviate 1.38.0 GA: HFresh graduates; Namespaces and Nested Object Filtering (preview)
- Key facts and current state of the topic
- Weaviate 1.38.0 is now generally available; HFresh, a disk‑oriented index inspired by SPFresh, moves to GA for fresher, lower‑RAM retrieval. Namespaces (control‑plane/data isolation) and Nested Object Filtering are introduced in preview. (github.com)
- Important context and background information
- Late‑interaction and heavy filtered workloads often outgrow in‑RAM HNSW. A disk‑optimized first stage plus multi‑tenant isolation reduces tail‑latency variance and simplifies large multi‑team clusters. (github.com)
- Recent developments or changes
- The 1.38.0 release notes enumerate HFresh performance/footprint optimizations and initial Namespaces/Nested‑filtering APIs—plan canaries on filter‑heavy shards before broad rollout. (github.com)
Milvus 2.6.18 (June 5): nullable vectors, element‑level Struct search, HTTP/2
- Key facts and current state of the topic
- Milvus v2.6.18 adds nullable vector fields (skip missing embeddings at query time) and element‑level vector search over Struct arrays (returns the matched element/offset). It also enables HTTP/2 on the proxy and improves QueryNode/QueryCoord scheduling. (milvus.io)
- Important context and background information
- These features help hybrid/near‑real‑time pipelines where embeddings arrive asynchronously and where fine‑grained signals (arrays/Structs) matter for relevance or compliance tagging. (milvus.io)
- Recent developments or changes
- Release date: June 5, 2026. The notes call out deadline‑aware admission, better recovery under load, and reduced allocations—aimed at steadier p95/p99 during spikes. (milvus.io)
FLASH‑MAXSIM: fused GPU kernels for late‑interaction (MaxSim) scoring
- Key facts and current state of the topic
- Late‑interaction (e.g., ColBERT/ColPali) is memory‑bound when materializing full token‑by‑token similarity tensors. FLASH‑MAXSIM fuses compute and reduction, avoiding tensor materialization. (arxiv.org)
- Important context and background information
- Production late‑interaction stacks often hit GPU memory/throughput ceilings in both training and serving; IO‑aware kernels can unlock larger batches and corpora without changing model semantics. (arxiv.org)
- Recent developments or changes
- The paper reports up to 3.9× (A100) / 4.7× (H100) speedups, ~16× less inference memory and ~28× less training memory, with 100% top‑20 agreement vs. FP32 reference—worth piloting for ColBERT‑class rerankers. (arxiv.org)
Vespa: BM25 (+ rank features) re‑examined—10.8% MRR@10 lift, better generalization
- Key facts and current state of the topic
- Vespa’s May 29 analysis re‑implements BM25 on MSMARCO and shows adding a small set of rank features lifts MRR@10 from 0.1901 to 0.2106 (~10.8%) while doubling generalization on a held‑out set. (blog.vespa.ai)
- Important context and background information
- For ad/search use cases, lexical/sparse baselines (and feature engineering) still provide strong, cost‑efficient candidates and can ease pressure on ANN/LLM budgets. (blog.vespa.ai)
- Recent developments or changes
- The write‑up includes configs/plots; consider hybrid BM25+sparse/dense candidates with lightweight learned rerankers before heavier stages. (blog.vespa.ai)
Azure AI Search: “retrieval reasoning effort” (preview) for agentic retrieval
- Key facts and current state of the topic
- A new preview setting (2026‑05‑01 API) lets you pick minimal/low/medium “reasoning effort” for agentic retrieval—controlling LLM‑based query planning, subquery fan‑out, and an optional second pass when initial results are weak. (learn.microsoft.com)
- Important context and background information
- Useful when tuning recall/latency/token‑spend trade‑offs for mixed knowledge sources; “medium” can trigger a classifier‑gated iterative search and higher ranking limits in supported regions. (learn.microsoft.com)
- Recent developments or changes
- Docs updated June 11, 2026; available via knowledge‑base defaults or per‑request overrides. Feature remains preview—validate behavior and costs before production. (learn.microsoft.com)