ColBERT

Definition

ColBERT (Colaborative BERT / Contextualized Late Interaction over BERT) is a neural retrieval model that represents queries and documents as sets of per-token vectors rather than single pooled vectors. Relevance is computed via the MaxSim operator — the sum of maximum dot products across query-document token pairs.

Created by Omar Khattab and Matei Zaharia at Stanford University, published at SIGIR 2020.

Architecture

Query tokens → BERT → [q1, q2, ..., qm]   (query token vectors)
Doc tokens   → BERT → [d1, d2, ..., dn]   (doc token vectors)

Score = Σᵢ max_j (qᵢ · dⱼ)    (MaxSim)

Special tokens:

  • [Q] prefix for queries (padded with [mask] tokens)
  • [D] prefix for documents

Versions

VersionYearKey Innovation
ColBERT v12020Original late interaction model
ColBERTv22021Denoised supervision + residual compression (6-10x storage reduction)
jina-colbert-v1-en2024Extended to 8192 tokens (Han Xiao / Jina AI)

Key Advantages

  1. Quality near cross-encoders — token-level interaction captures nuanced relevance
  2. Scalability of bi-encoders — documents pre-encoded offline, only queries at runtime
  3. Explainability — MaxSim scores reveal which tokens drove retrieval (unlike dense embeddings)
  4. Training efficiency — fewer labeled examples than single-vector models

Compression (Vespa implementation)

Asymmetric binarization by Jo Kristian Bergum:

  • Query vectors: float (full precision)
  • Document vectors: int8 (compressed)
  • Result: 32x compression with minimal accuracy loss

vs. Other Architectures

ModelInteractionSpeedQuality
Bi-EncoderNone (separate encoding)FastGood
Cross-EncoderEarly (joint encoding)SlowBest
ColBERTLate (token-level)MediumNear cross-encoder

Tools

  • RAGatouille — Python library for ColBERT in RAG pipelines
  • Vespa — native ColBERT embedder
  • FAISS — indexing for ColBERT document vectors

Articles

People