Search at Slack
Source: https://slack.engineering/search-at-slack-431f8c80619e Authors: Isabella Tromba (Software Engineer, SLI), John Gallagher, Jason Liszka (Senior Staff Engineer)
Summary
How Slack built a two-stage Learning to Rank system for message search, addressing unique challenges of enterprise communication search where queries rarely repeat and each user accesses unique documents.
Architecture
Two-Stage System:
- First stage — Solr custom sorting on features easy for Solr to compute (fast)
- Second stage — Application-layer re-ranking using additional features (accurate)
LTR model: SVM via SparkML trained on pairwise-transformed click data.
Unique Data Challenges
- Slack queries rarely repeat (unlike web search)
- Each user accesses unique private documents
- Solution: leverage the “work graph” — internal interaction history — instead of aggregate click data
Position Bias Handling
Users clicked top results 30% more often purely due to position. Fix: oversample clicks on results lower in the list to equalize positional distribution in training data.
Top Ranking Signals
- Message age (recency)
- Lucene match score
- User affinity to message author
- Priority scores for channels and DMs
- Message metadata (pins, stars, reactions)
- Content characteristics (word count, formatting)
Results
- +9% increase in clicked searches
- +27% increase in clicks at position 1
- Top Results feature combining relevant + recent results