Understanding BERT and Search Relevance

Source: https://opensourceconnections.com/blog/2019/11/05/understanding-bert-and-search-relevance/ Author: Max Irwin

Summary

An early practical analysis of BERT for search relevance (2019), examining how BERT’s contextualized token embeddings work, what they cost in a search context, and where they add value.

BERT Basics for Search

BERT (Bidirectional Encoder Representations from Transformers) produces contextualized embeddings: each token’s representation depends on the surrounding tokens.

For search relevance, the key output is the [CLS] token embedding — a 768-dimensional vector representing the meaning of the entire input (query or document).

The Storage Challenge

BERT embeddings for search at scale:

Each document: 768 × 4 bytes = ~3KB for a single [CLS] vector
1M documents: ~3GB for just the CLS vectors
If storing per-token embeddings (for ColBERT-style late interaction): 1000x more
Storage is ~1000x larger than BM25 inverted index for equivalent coverage

This was a significant practical barrier in 2019; less so today with vector DBs and quantization.

The “Black Box” Challenge

Unlike BM25 (where you can see which term matched and why), BERT similarity is opaque:

Two documents with no lexical overlap can score identically to one with exact matches
Explaining why a result ranked highly requires interpretability techniques (LIME, SHAP, attention visualization)
This complicates debugging and user trust

BERT Use Cases in Search (2019 perspective)

Good fit:

Semantic query understanding (query type classification)
Cross-encoder reranking of a small candidate set (< 1000 docs)
Entity extraction from queries

Poor fit:

First-stage retrieval over millions of documents (too slow without ANN index)
Real-time query embedding with latency < 10ms (BERT was ~100ms on CPU in 2019)

Progress Since 2019

Bi-encoders + FAISS/HNSW enable BERT-based first-stage retrieval
Quantization reduces storage 4-8x
Distilled models (MiniLM, DistilBERT) cut latency 5-10x
ColBERT demonstrated that per-token late interaction improves accuracy

Key Concepts

CLS vector — 768-dim BERT embedding representing full sequence
Storage overhead — BERT embeddings ~1000x larger than inverted index
Black box opacity — no lexical explanation for BERT similarity
Cross-encoder for reranking — practical early use case for BERT in search
Bi-encoder + ANN — the solution to first-stage retrieval latency

Awesome Search KG

Explorer

Understanding BERT and Search Relevance

Understanding BERT and Search Relevance

Summary

BERT Basics for Search

The Storage Challenge

The “Black Box” Challenge

BERT Use Cases in Search (2019 perspective)

Progress Since 2019

Key Concepts

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Understanding BERT and Search Relevance

Understanding BERT and Search Relevance

Summary

BERT Basics for Search

The Storage Challenge

The “Black Box” Challenge

BERT Use Cases in Search (2019 perspective)

Progress Since 2019

Key Concepts

Related Concepts

Related Articles

People

Graph View

Table of Contents

Backlinks