The Missing WHERE Clause in Vector Search
Source: https://www.pinecone.io/learn/vector-search-filtering/
Author: James Briggs (Pinecone)
Summary
James Briggs explains why metadata filtering in vector search is hard, why naive pre-filter and post-filter approaches fail at the extremes, and how Pinecone’s single-stage approach solves the problem.
The SQL Analogy
SQL has a natural WHERE clause:
SELECT * FROM products
WHERE category = 'shoes' AND price < 100
ORDER BY relevance DESC
LIMIT 10;Vector search should have the equivalent:
search(query_vector, filter={"category": "shoes", "price__lt": 100}, top_k=10)
But this is technically non-trivial because ANN indexes don’t natively support arbitrary predicates.
Why Pre-filtering Fails
Pre-filter: apply metadata filter → run ANN on filtered subset.
Problem at high selectivity (e.g., filter matches 0.1% of corpus):
- ANN indexes need a minimum corpus size to work well
- With only 0.1% of docs, the “index” is just a few thousand vectors
- Quality degrades to near-exact search anyway
- Index structures built for the full corpus don’t apply
Why Post-filtering Fails
Post-filter: run ANN on full corpus → apply metadata filter to top-K.
Problem at high selectivity:
Query: find shoes under $10 (0.1% of catalog)
ANN returns top-1000 by vector similarity
Post-filter: only 1 of 1000 matches "price < $10"
Effective recall: catastrophic
Need to over-retrieve 100–1000x to compensate → kills latency.
Pinecone’s Single-Stage Solution
The key insight: during HNSW graph traversal, apply the metadata filter inline:
# Conceptually: filter is a pruning function on graph traversal
def hnsw_filtered_search(query, filter_fn, k):
candidates = []
for node in traverse_hnsw_graph(query):
if filter_fn(node.metadata): # apply filter during traversal
candidates.append(node)
if len(candidates) >= k:
break
return candidatesBy integrating the filter into the traversal, the system:
- Never retrieves docs that fail the filter
- Continues traversal until k matching docs are found
- Maintains recall even at high selectivity
Metadata Storage Architecture
# Pinecone upsert with metadata
index.upsert([{
"id": "product_123",
"values": embedding_vector,
"metadata": {
"category": "shoes",
"price": 89.99,
"brand": "Nike",
"in_stock": True
}
}])
# Filtered query
results = index.query(
vector=query_embedding,
filter={"category": "shoes", "price": {"$lt": 100}},
top_k=10
)Performance Characteristics
| Filter Selectivity | Pre-filter | Post-filter | Single-stage (Pinecone) |
|---|---|---|---|
| 50% | Good | Good | Good |
| 10% | Degraded | Good | Good |
| 1% | Poor | Degraded | Good |
| 0.1% | Broken | Broken | Good |
Single-stage maintains consistent quality across all selectivity levels.
Related Articles
- Nearest Neighbor Indexes for Similarity Search — same author, ANN infrastructure
- SPLADE for Sparse Vector Search Explained — same author, sparse approach
- How Context-Aware Embeddings Are Transforming Enterprise Search — enterprise filtering use case
Related Concepts
- Vector Filtering — primary topic
- Dense Vector Retrieval — underlying infrastructure
- Semantic Search — use case
- RAG — often needs filtered retrieval (per-document, per-user)
- Hybrid Search — filtering applies to both legs
People
- James Briggs — author (Pinecone)