Retrieval at Scale | Drop for 2025-10-31

TL;DR

A quiet but not empty week (Oct 24–31, 2025). Three items worth your time: (1) Weaviate 1.33.3 ships latency and multi‑DC clustering improvements that matter for high‑QPS retrieval; (2) Weaviate 1.34.0‑rc.0 preview introduces SPFresh‑style clustered vector storage and server‑side dynamic batching—promising for freshness and throughput; (3) Weaviate Cloud moves to a simpler pricing model (Oct 27), with costs tied directly to vector dimensions and compression choices.

Weaviate 1.33.3: lower tail‑latency in storage + multi‑DC Raft support

  • Key facts and current state of the topic
    • Open‑source Weaviate v1.33.3 released on October 29, 2025. Highlights include LSM store synchronization rework to prevent lock contention/latency spikes and Raft communication changes that add multi–data center support; plus minor fixes (e.g., RQ bits export for dynamic indexes). (github.com)
  • Important context and background information
    • LSM write/read contention and replication topology often dominate p95/p99 in vector DBs at ad‑scale. Multi‑region/high‑availability needs make Raft networking choices material to consistency and failover.
  • Recent developments or changes
    • Expect more stable p95/p99 under ingest or compaction pressure and simpler multi‑DC cluster layouts—useful if you run filtered ANN + multi‑stage ranking with tight SLOs. Evaluate on production shards during peak traffic, especially with hybrid (BM25+vector) requery enabled. (github.com)

Weaviate 1.34.0‑rc.0 (preview): SPFresh‑style clustered vector storage + dynamic batching

  • Key facts and current state of the topic
    • A release candidate for 1.34 introduces “SPFresh Clustered Vector Storage (Phase 1),” server‑side dynamic batching (preview), and Rotational Quantization (RQ) support for flat indexes. (github.com)
  • Important context and background information
    • SPFresh‑class clustering targets fresher, on‑disk‑friendly vector search with better update/query trade‑offs—relevant for constantly changing catalogs (ads, marketplace). Dynamic batching helps stabilize throughput/latency under bursty QPS without client‑side micro‑batch logic.
  • Recent developments or changes
    • If you’re evaluating disk‑oriented or continuously updated indexes, this RC is a timely sandbox. Pair with late‑interaction rerankers to test recall/latency envelopes before GA. (github.com)

Weaviate Cloud: pricing model update (effective Oct 27)

  • Key facts and current state of the topic
    • Weaviate announced a simpler Cloud pricing model effective October 27, 2025, with costs tied to vector dimensions, storage, backups, and configuration choices (e.g., compression/regions). (weaviate.io)
  • Important context and background information
    • Budgeting for retrieval at scale often hinges on vector footprint (dimension, quantization) and region/HA choices. Transparent pricing mapped to these levers makes capacity planning for hybrid/late‑interaction stacks easier.
  • Recent developments or changes
    • For pilots of multi‑vector/late‑interaction embeddings, turn on compression (e.g., RQ) and validate the cost/QPS trade‑off under the new model before scaling tenants. (weaviate.io)

If you were expecting more: beyond these, we did not find additional late‑interaction, learned‑sparse, or ANN‑infra releases/papers between October 24–31, 2025 that met the bar for inclusion.