Elasticsearch

Dominant open-source distributed search engine built on Apache Lucene. Default backend for a large share of production search systems globally — e-commerce, log analytics, enterprise search. Maintained by Elastic.


What It Does

Elasticsearch indexes documents as JSON and exposes a REST API for full-text search, aggregations, filtering, and (since v8) dense vector ANN search. Horizontally scalable via sharding and replication.

Key capabilities:

  • Full-text searchBM25-based relevance scoring (Lucene underneath)
  • Dense vector searchknn query with HNSW index; supports Scalar Quantization, Binary Quantization (BBQ), ELSER
  • Hybrid search — BM25 + kNN combination with score normalization and fusion
  • Sparse retrievalELSER (Elastic Learned Sparse Encoder) for zero-shot semantic retrieval without dense embeddings
  • Aggregations — faceted counts, histograms, metrics at scale
  • Boostingfunction_score query for Results Boosting (popularity, margin, recency)
  • Filtering — pre-filter and post-filter support for knn queries (Vector Filtering)

Retrieval Options

ModeQuery typeNotes
BM25match, multi_matchDefault; fast; no ML dependency
Sparse semanticsparse_vector + ELSERZero-shot; no embedding model needed
Dense semanticknnRequires embedding model at index + query time
Hybridknn + match combinedScore normalization required (RRF or linear)

Quantization (Dense Vectors)

MethodCompressionNotes
int8 Scalar QuantizationNear-lossless; default for kNN
BBQ (Better Binary Quantization)32×Elastic’s production binary quant scheme

Notable Use Cases in This Vault

  • Zalando — base search layer
  • Uber — custom Lucene-based platform
  • Semantic Scholar — Elasticsearch + ML reranker

Articles

People