Search Platforms

Overview

A search platform is the engine that stores, indexes, and retrieves content at query time. Choosing a platform is one of the earliest and most consequential infrastructure decisions for a search team — it determines what retrieval methods are available, how much operational overhead the team carries, and what the cost structure looks like at scale.

Platforms split along two axes:

Open source vs. proprietary — open-source engines give full control over the stack and data; proprietary SaaS solutions trade that control for faster onboarding, managed infrastructure, and vendor-provided relevance tooling.
Self-hosted vs. SaaS — even open-source engines (e.g. Elasticsearch, OpenSearch) are available as managed cloud services; pure SaaS products (Algolia, Pinecone) are only available hosted.

Lucene-Based

Engines built on Apache Lucene. Lucene provides a full-featured foundation: inverted index and BM25 for lexical retrieval, but also native support for vector fields (HNSW) and hybrid search. These engines are not limited to keyword search — they are increasingly used as unified retrieval backends.

Elasticsearch — the dominant choice for production search; rich query DSL, extensive ecosystem. Available open-source (self-hosted) and as Elastic Cloud (SaaS).
Apache Solr — the other major Lucene-based engine; older, more configuration-heavy, common in enterprise and media.
OpenSearch — AWS-led fork of Elasticsearch (post-7.x licensing split); fully open-source, available managed on AWS.

Vector / Semantic Search

Purpose-built for dense embedding retrieval. These engines index high-dimensional vectors (via HNSW or similar ANN structures) and return results by similarity rather than keyword overlap. Most support hybrid search combining keyword and vector scores.

Qdrant Vector DB — open-source, Rust-based, strong filtering performance; available self-hosted and as Qdrant Cloud (SaaS).
Weaviate Vector DB — open-source, GraphQL API, built-in vectorization modules; available self-hosted and as Weaviate Cloud (SaaS).
Pinecone — proprietary SaaS-only vector database; no self-hosting option; known for operational simplicity.
Milvus / Zilliz — open-source vector DB (Milvus) with a managed SaaS layer (Zilliz Cloud).
Chroma — lightweight open-source vector store popular in RAG prototyping.

Full-Featured / Hybrid Platforms

Engines designed from the ground up with relevance, hybrid retrieval, and production scale as first-class concerns.

Vespa — open-source, self-hosted (or Vespa Cloud SaaS); uniquely combines BM25, dense vector, and structured attribute matching in a single engine with built-in ranking expressions. Strong choice for complex ranking pipelines. See From Elasticsearch to Vespa.
Typesense — open-source, self-hosted or Typesense Cloud; developer-friendly, fast, opinionated defaults; positioned as a simpler alternative to Elasticsearch for search-heavy apps.
Meilisearch — open-source, self-hosted or Meilisearch Cloud; optimized for instant search UX, typo tolerance, and ease of setup.

SaaS / Proprietary Platforms

Fully managed, proprietary offerings where the engine is not available for self-hosting.

Algolia — the dominant SaaS search platform for product and site search; strong relevance-tuning UI, extensive front-end SDKs, high price point at scale.
Coveo — enterprise SaaS search with ML-based ranking and personalization; targets large organizations.

Specialized / Vertical Platforms

Some platforms are not general-purpose search engines but focus on a specific domain or problem layer, typically sitting on top of an underlying engine.

searchHub (Andreas Wagner) — e-commerce-focused SaaS platform for search optimization and query understanding. Covers query rewriting, A/B testing, analytics, and spell correction (SmartQuery). Also released the Open Commerce Search Stack (OCSS) as an open-source reference implementation for retail search. See Three Pillars of Search Quality - Findability and Common Pitfalls of Onsite Search Experimentation.
Hornet (Jo Kristian Bergum, Lester Solbakken) — retrieval infrastructure platform purpose-built for AI agents. Focuses on high-precision context delivery for agentic workloads, where query patterns differ fundamentally from human search (longer queries, web-search operators, multi-turn compounding). See This Is What Agentic Retrieval Looks Like and Mutually Assured Distraction.
Empathy.co — e-commerce search and discovery platform; SaaS, focused on search UX, analytics, and personalization.
Constructor.io — SaaS product discovery platform for e-commerce; combines search, recommendations, and browse with ML-driven ranking.

Key Trade-offs When Choosing a Platform

Dimension	Open Source / Self-Hosted	SaaS / Proprietary
Control	Full (data, config, infra)	Limited
Ops burden	High	Low
Cost at scale	Lower (infra only)	Higher (per-query or per-record pricing)
Vendor lock-in	None	High
Time to first result	Slower	Fast

E-commerce Search — platform choice is a recurring theme in ecommerce search case studies
Enterprise Search — enterprise teams frequently operate Elasticsearch or Vespa at scale
Hybrid Search Blueprint Series Semantic Boosting — hybrid retrieval patterns that span Lucene and vector platforms
From Elasticsearch to Vespa Rebuilding the Kleinanzeigen Homepage Feed — Part 1 — real migration case study
Conversational and Agentic Search — agentic workloads impose distinct retrieval requirements; Hornet is built specifically for this use case

Awesome Search KG

Explorer

Search Platforms

Search Platforms

Overview

Lucene-Based

Vector / Semantic Search

Full-Featured / Hybrid Platforms

SaaS / Proprietary Platforms

Specialized / Vertical Platforms

Key Trade-offs When Choosing a Platform

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Search Platforms

Search Platforms

Overview

Lucene-Based

Vector / Semantic Search

Full-Featured / Hybrid Platforms

SaaS / Proprietary Platforms

Specialized / Vertical Platforms

Key Trade-offs When Choosing a Platform

Related Notes

Graph View

Table of Contents

Backlinks