Canva Search Pipeline Part II
Source: https://canvatechblog.com/search-pipeline-part-ii-3b43978607cd Author: Stuart Cam (Canva Search & Recommendations team)
Summary
Describes the new componentized pipeline Search Architecture that replaced Canva’s big ball of mud. Core principle: isolated, stateless, immutable pipeline phases with a shared interface.
New Architecture: Pipeline Phases
Validation & Rewriting
↓
Tokenization (locale-specific + semantic)
↓
Annotation (NER, language detection, synonyms)
↓
Candidate Generation (OpenSearch, Solr, SageMaker — parallel)
↓
Feature Extraction (Redis-backed feature stores)
↓
Re-ranking (business rules + ML models)
Key Design Principles
- Isolated components: no shared state between concurrent steps
- Stateless execution: state managed externally (Redis/DynamoDB)
- Immutable flow: previous steps cannot be mutated by later steps
- Shared interface: all pipeline types expose the same abstract interface
ML & Performance Optimizations
- Parallel candidate generators with deadline-based execution
- 50ms deadline — generate as many candidates as possible; stop at deadline
- Pagination strategy: overFetch top 500 results for early pages (precision priority); faster execution for deep pages (bot handling)
Design Process
Collaborative prototyping — each engineer tackled an architecture area they cared about; consolidated outputs into initial design.
Impact
Enabled migration of audio, font, video, and media search to shared architecture. “A shared mental model is key to limiting inaccuracies and contradictions.”
Related Concepts
People
- Stuart Cam (Canva)