OpenSearch

Open-source distributed search and analytics engine forked from Elasticsearch 7.10.2 in 2021 by AWS, after Elastic changed its license from Apache 2.0 to SSPL/Elastic License. Licensed under Apache 2.0 and governed by a community that includes the Linux Foundation. Primary managed offering: Amazon OpenSearch Service.


What It Does

OpenSearch indexes documents as JSON and exposes a REST API for full-text search, aggregations, filtering, and vector ANN search. Architecture mirrors Elasticsearch at the shard/node level — same Lucene foundation, same distributed replication model. Horizontally scalable.

Key capabilities:

  • Full-text searchBM25-based relevance scoring via Lucene
  • Dense vector search — k-NN plugin with HNSW; supports Faiss, nmslib, and native Lucene backends; up to 16,000 dimensions
  • Hybrid search — BM25 + kNN with normalization pipeline; native RRF support
  • Neural sparse retrieval — learned sparse encoding (open-source counterpart to ELSER)
  • Search pipelines — composable request/response processors for ML inference and score normalization
  • Aggregationssignificant_terms, facets, histograms; foundational for Wormhole Vectors traversal
  • PPL (Piped Processing Language) — SQL-like analytics query language

Retrieval Options

ModeQuery typeNotes
BM25match, multi_matchDefault; no ML dependency
Dense semanticknnk-NN plugin; choice of Faiss / nmslib / Lucene engine
Hybridhybrid + normalization pipelineBuilt-in RRF and min-max normalization
Neural sparseneural_sparseRequires a sparse encoding model

Vector Search: k-NN Plugin Backends

EngineNotes
FaissFacebook’s library; strong at scale; GPU support
nmslibHNSW; good recall/speed balance; original plugin default
LuceneNative HNSW; no plugin overhead; best for smaller indexes

AWS Integration

Deployed as Amazon OpenSearch Service, OpenSearch gains:

  • Native IAM, VPC, CloudWatch, and Kinesis integration
  • Direct connector to Amazon Bedrock for embedding inference and RAG generation
  • Managed scaling, snapshots, and patching

See Innovating Search Experience with Amazon OpenSearch and Amazon Bedrock.

Advanced Retrieval: Wormhole Vectors

Wormhole Vectors exploit OpenSearch’s k-NN plugin and significant_terms aggregations together to traverse between dense, sparse, and behavioral vector spaces — beyond what standard RRF hybrid search achieves.

# Dense query → aggregate statistically significant keywords from results
response = client.search(
    index="products",
    body={
        "size": 50,
        "query": {"knn": {"embedding_vector": query_vector, "k": 50}},
        "aggs": {
            "wormhole_keywords": {
                "significant_terms": {"field": "description_keywords", "size": 10}
            }
        }
    }
)

See Wormhole Vectors Beyond Hybrid Search in OpenSearchDima Kan (Aiven).

Performance vs. Elasticsearch

2024–2025 benchmarks show Elasticsearch consistently faster:

  • Text search / aggregations: 40–140% faster
  • Vector search: 2–12x faster

The gap is largest on complex, high-throughput workloads. At smaller scales the difference narrows and OpenSearch is often sufficient. See Elasticsearch vs OpenSearch for the full comparison.

Notable Implementations in This Vault

  • Dima Kan (Aiven) — Wormhole Vectors in production; SKG traversal across sparse, dense, and behavioral spaces
  • Canva — one of three parallel candidate generators alongside Solr and SageMaker (see Canva Search Pipeline Part II)
  • Elasticsearch — upstream project; faster but proprietary-licensed above 7.10
  • OpenSearch Dashboards — visualization layer (Kibana fork)

Articles

Comparison