Migrating to Elasticsearch with dense vector for Carousell Spotlight

Background

Carousell’s Spotlight boosts listing visibility. Their custom vector search engine Carolene (Carousell + Lucene) was built in 2018 because Elasticsearch lacked dot product support at the time. After growing to 10K+ documents, Carolene had critical issues: OOM errors, index corruption, and replication failures.

Why Elasticsearch (from v7.6)

Elasticsearch 7.6 added stable dense vector support (DotProduct + CosineSimilarity), plus mature cluster orchestration, HA, index sharding, and replication.

Important caveat: dense vector operations are still O(M×N) — Elasticsearch offers better operations, not better algorithms.

Performance results

SetupP50P90P99
Carolene (stronger hardware, 13K docs, 400 QPS)~4ms~30ms~100ms
Elasticsearch POC (13K docs, 400 QPS)~43ms~77ms~121ms
Elasticsearch production (actual)~8ms~24ms~77ms

Production outperformed POC expectations significantly.

Practical tips

  1. No silver bullets — benchmark your specific configuration
  2. Separate indexes early — split by frequently filtered predicates
  3. Filter aggressively first — reduce N in O(M×N) before vector calculations
  4. Match tool to scale — dense vector for small-to-medium indexes needing high precision; ANN (FAISS, Annoy) for massive datasets prioritizing recall

People