Search Platforms

Overview

A search platform is the engine that stores, indexes, and retrieves content at query time. Choosing a platform is one of the earliest and most consequential infrastructure decisions for a search team — it determines what retrieval methods are available, how much operational overhead the team carries, and what the cost structure looks like at scale.

Platforms split along two axes:

  • Open source vs. proprietary — open-source engines give full control over the stack and data; proprietary SaaS solutions trade that control for faster onboarding, managed infrastructure, and vendor-provided relevance tooling.
  • Self-hosted vs. SaaS — even open-source engines (e.g. Elasticsearch, OpenSearch) are available as managed cloud services; pure SaaS products (Algolia, Pinecone) are only available hosted.

Lucene-Based

Engines built on Apache Lucene. Lucene provides a full-featured foundation: inverted index and BM25 for lexical retrieval, but also native support for vector fields (HNSW) and hybrid search. These engines are not limited to keyword search — they are increasingly used as unified retrieval backends.

  • Elasticsearch — the dominant choice for production search; rich query DSL, extensive ecosystem. Available open-source (self-hosted) and as Elastic Cloud (SaaS).
  • Apache Solr — the other major Lucene-based engine; older, more configuration-heavy, common in enterprise and media.
  • OpenSearch — AWS-led fork of Elasticsearch (post-7.x licensing split); fully open-source, available managed on AWS.

Purpose-built for dense embedding retrieval. These engines index high-dimensional vectors (via HNSW or similar ANN structures) and return results by similarity rather than keyword overlap. Most support hybrid search combining keyword and vector scores.

  • Qdrant Vector DB — open-source, Rust-based, strong filtering performance; available self-hosted and as Qdrant Cloud (SaaS).
  • Weaviate Vector DB — open-source, GraphQL API, built-in vectorization modules; available self-hosted and as Weaviate Cloud (SaaS).
  • Pinecone — proprietary SaaS-only vector database; no self-hosting option; known for operational simplicity.
  • Milvus / Zilliz — open-source vector DB (Milvus) with a managed SaaS layer (Zilliz Cloud).
  • Chroma — lightweight open-source vector store popular in RAG prototyping.

Engines designed from the ground up with relevance, hybrid retrieval, and production scale as first-class concerns.

  • Vespa — open-source, self-hosted (or Vespa Cloud SaaS); uniquely combines BM25, dense vector, and structured attribute matching in a single engine with built-in ranking expressions. Strong choice for complex ranking pipelines. See From Elasticsearch to Vespa.
  • Typesense — open-source, self-hosted or Typesense Cloud; developer-friendly, fast, opinionated defaults; positioned as a simpler alternative to Elasticsearch for search-heavy apps.
  • Meilisearch — open-source, self-hosted or Meilisearch Cloud; optimized for instant search UX, typo tolerance, and ease of setup.

SaaS / Proprietary Platforms

Fully managed, proprietary offerings where the engine is not available for self-hosting.

  • Algolia — the dominant SaaS search platform for product and site search; strong relevance-tuning UI, extensive front-end SDKs, high price point at scale.
  • Coveo — enterprise SaaS search with ML-based ranking and personalization; targets large organizations.

Specialized / Vertical Platforms

Some platforms are not general-purpose search engines but focus on a specific domain or problem layer, typically sitting on top of an underlying engine.

Key Trade-offs When Choosing a Platform

DimensionOpen Source / Self-HostedSaaS / Proprietary
ControlFull (data, config, infra)Limited
Ops burdenHighLow
Cost at scaleLower (infra only)Higher (per-query or per-record pricing)
Vendor lock-inNoneHigh
Time to first resultSlowerFast