Support Ukrainian fight for the freedom

I’ve been building e-commerce search applications for more than ten years. Below is a list of some publications, conferences, and books that have inspired me, grouped by topic. If an item fits into multiple topics, it appears in multiple sections.

⭐ Star us on GitHub — it helps!

Also check my other collections awesome e-commerce, awesome knowledge graphs, awesome cloud apps

Topics

General, fun, philosophy

Embeddings

Encoder architecture
Bi-encoders / Two towers (no interaction)
Cross-encoders (early interaction)
ColBERT (late interaction)
Vector types
Dense vectors

Input size limits

Matryoshka embeddings

Context-aware embeddings

Sparse vectors

SPLADE

Constructed query vectors

Hypothetical Document Embeddings (HyDE)

Bag-of-documents

Wormhole vectors

Dimensionality handling
Dimensionality reduction
  • PCA
  • t-SNE
Quantization
  • Scalar quantization
  • Binary quantization
  • Product quantization
  • Rotational quantization
Finetuning
Supervised finetuning
Knowledge distillation
Multimodal finetuning

Vector retrieval

Reciprocal rank fusion (RRF)

Linear Score Combination

Multimodality Problems

Modality Gap
Contrastive Gap

Search Quality Assurance

Evaluation Paradigms

Session-based Evaluation

Query-based Evaluation

Random sampling
Stratified sampling
Probability-proportional-to-size sampling

Metrics

Focused on ranking quality

Focused on diversity of results

MMR
Average Pairwise Distance, APD
  • edit distance
  • semantic distance
Entropy

Behavioral / Product / Performance

Clicks
Zero clicks
Clicks residual
Zero results

Evaluation Modes

Offline

Judgements
HUman judgements
Implicite judgements

add something on clicks streams

Using LLM as judge

Online

Areas of application

Search Results

Retrieval

Relevance

Relevance Algorithms
BM25
Bayesian BM25 (BB25)

Ranking

Multi-stage ranking

Reranking

Learning to Rank

Bias

Diversification

MMR

Personalisation

Zero search results

Search UX

Baymard Institute

Nielsen Norman Group

Enterprise Knowledge LLC

Facets

Accidental Taxonomist

Other

Spelling correction

Synonyms

Stopwords

Suggestions

Synonyms: autocomplete, search as you type, suggestions

Graphs/Taxonomies/Knowledge Graph

Integrating Search and Knowledge Graphs (by Enterprise Knowledge)

Query expansion

Query understanding

Search Intent

Query segmentation

Algorithms

BERT

ColBERT

Collocations, common phrases

Other Algorithms

Hashing

Sorting by average ratings

Keywords extraction

Tracking, profiling, GDPR, Analysis

Tools, platforms, helpers for search tracking

Resources

Experiments

A/B testing, MABs

Testing, metrics, KPIs

KPIs

Evaluating Search (by Daniel Tunkelang)

Measuring Search (by James Rubinstein)

Three Pillars of Search Relevancy (by Andreas Wagner)

Architecture

Education and networking

Events

Conferences

Trainings and courses

Books

Blogs and Portals

Papers

Search Team. Managment, composition, hiring

Job Interviews

Engineering

Blogposts series

Search Optimization 101 (by Charlie Hull)

Query Understanding (by Daniel Tunkelang)

Better search through query understanding.

Grid Dynamics

Considering Search: Search Topics (by Derek Sisson)

Industry players

Personalies and influencers

Search Engines

  • Google
  • Bing
  • Not Human Search - Agent-first search engine for discovering AI tools and MCP servers
  • Amazon
  • eBay

Products and services

  • Algolia
  • [Vespa] (https://vespa.ai/)
  • Elasticsearch - Distributed search & analytics engine
  • ParadeDB - Modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
  • Solr - Solr is the blazing-fast, open source, multi-modal search platform built on the full-text vector, and geospatial search capabilities of Apache Lucene
  • Fess Enterprise Search Server
  • Typesense - an opensource alternative to Algolia.
  • TopK - combines AI-powered query understanding with adaptive ranking to provide the most relevant results in your domain.
  • SearchHub.io
  • Datafari - an open source enterprise search solution.
  • Qdrant - an open source vector database.
  • Awakari - Real-Time search from unlimited sources like RSS, Fediverse, Telegram. Text keyword matching conditions, numeric conditions, condition groups. Reverse search index based.
  • Meilisearch - Open source search API that supports full-text, vector, geospatial & faceted search.
  • Omnigraph - Typed graph database where agents branch and merge like Git. S3-native, Rust, traversal + vector + BM25 in one runtime.
  • SearchPixel - Hybrid CLIP + BGE product-search API for Shopify and WooCommerce stores. Combines visual and semantic search, priced in INR (Indian rupees), and free during the open beta for the first 50 stores.

Consulting companies

Case studies

E-commerce

Multisided markets

Videos

Apache Solr Short Tips

Channels

Datasets

Tools

Spacy

Awesome Spacy - Natural language upderstanding, content enrichment etc.

Word2Vec

Libs

Other

Other awesome stuff

Unsorted

OLD TOC (to review)