ColPali & Elasticsearch: How to Search Complex Documents
Part 1 of Elastic’s ColPali series. Introduces ColPali and shows how to index and search visually complex documents — tables, figures, multi-column layouts — in Elasticsearch without OCR, layout detection, or semantic chunking pipelines. (Part 2 is Late Interaction Models - How to Scale and Optimize in Elasticsearch.)
The Problem ColPali Solves
Traditional retrieval over complex documents requires a brittle pipeline — OCR, layout detection, semantic chunking. ColPali (introduced 2024) replaces all of it by embedding the page image directly, building on the Late Interaction mechanism from ColBERT and the vision capabilities of the PaliGemma backbone.
How It Works
- Indexing — a page screenshot is divided into a grid (e.g. 32×32 → ~1024 patches); each patch becomes a 128-dim vector capturing its contextual meaning. No text extraction.
- Query — the model produces one vector per query token.
- Scoring — for each query vector, take the max dot product against all document vectors, then sum across query vectors (MaxSim).
Why It Matters
- Messy real-world data — the ViDoRe benchmark, released with the ColPali paper, covers government, healthcare, and research document images. ColPali outperforms complex retrieval pipelines and image-embedding baselines.
- Interpretability — unlike single-vector bi-encoders, late interaction reveals where and why a document matched (per-patch match heatmaps).
Searching with ColPali in Elasticsearch
From Elasticsearch 8.18 (tech preview), late-interaction models are supported via the new rank_vectors field type and the maxSimDotProduct scoring function:
mappings = {"mappings": {"properties": {"col_pali_vectors": {"type": "rank_vectors"}}}}
# score:
"script": {"source": "maxSimDotProduct(params.query_vector, 'col_pali_vectors')"}Successors such as ColQwen have since been released. Part 2 covers making this production-ready (bit vectors, average vectors, token pooling, two-stage rescore).
Authors
Related Concepts
ColPali · Late Interaction · ColBERT · Dense Vector Retrieval · Reranking · Multimodal Embeddings
Related Articles
- Late Interaction Models - How to Scale and Optimize in Elasticsearch — Part 2 (scaling)
- What is ColBERT and Late Interaction and Why They Matter in Search