Cross-Encoder

Definition

A cross-encoder processes the query and document jointly in a single encoder pass, producing a relevance score from their combined representation. It captures rich query-document interactions but requires encoding each pair separately — making it too slow for first-stage retrieval over large corpora.

How It Works

[Query + Document] → BERT encoder → relevance score

The query and document are concatenated (with separator tokens) and passed through a transformer together. The [CLS] token embedding is projected to a scalar relevance score.

Key Properties

PropertyValue
Query-document interactionFull (early interaction)
Document pre-computationNot possible — must encode per query-doc pair
SpeedSlow — O(num_candidates) at query time
QualityHighest — rich interaction captures subtle relevance

Role in Multi-Stage Retrieval

Cross-encoders are typically used as rerankers in a two-stage pipeline:

Stage 1: Bi-encoder retrieves top-100 candidates (fast)
Stage 2: Cross-encoder reranks top-100 (slow, but small set)

This pipeline gets the speed of Bi-Encoder retrieval with the quality of cross-encoder scoring.

Training

  • Trained on query-document pairs with binary or graded relevance labels
  • MS MARCO is the standard training dataset
  • Can use knowledge distillation from larger cross-encoders

vs. Other Architectures

Cross-EncoderBi-EncoderColBERT
InteractionEarly (joint)NoneLate (token-level)
SpeedSlowFastMedium
QualityBestGoodNear cross-encoder
ScalabilityNot scalableScalableScalable
  • Bi-Encoder — faster retrieval model it complements
  • ColBERT — alternative late-interaction model bridging speed and quality
  • Late Interaction — between bi-encoder (none) and cross-encoder (early) interaction
  • Retrieval Pipeline — cross-encoder as Stage 2 reranker
  • ELSER — distilled from a cross-encoder teacher

Articles