Slack — Enterprise Message Search with LTR
Problem
Message search in Slack has two fundamental differences from web or e-commerce search:
- Queries rarely repeat — people search for specific messages in specific contexts; there’s no stable query distribution to build signals from
- Documents are private and user-specific — every user accesses a unique document set (their channels, DMs, files); aggregate click signals don’t generalize across users
Standard LTR training on aggregate click data doesn’t work. The team needed a signal source that is personal, rich, and available despite query uniqueness.
Architecture
Two-stage ranking:
- First stage — Solr custom sorting on features cheap for Solr to compute (fast retrieval)
- Second stage — Application-layer re-ranking with richer features (accurate, slower)
LTR model: SVM via SparkML, trained on pairwise-transformed click data.
Key Innovation: Work Graph as Signal Source
Instead of aggregate query-click logs, Slack used the work graph — internal interaction history per user:
- Messages from people you interact with frequently rank higher
- Channels and DMs you prioritize rank higher
- Pins, stars, and reactions surface important content
This personal interaction graph provides stable, high-quality relevance signal that doesn’t depend on query repetition.
Position Bias Fix
Discovery: users clicked top results 30% more often purely due to position.
Fix: oversample clicks on lower-ranked results during training data construction to equalize positional distribution. This prevents the model from learning “rank 1 is always clicked” as a spurious feature.
Top Ranking Signals
- Message recency (age)
- Lucene BM25 match score
- User affinity to message author (from work graph)
- Priority scores for channels and DMs
- Message metadata: pins, stars, reactions count
- Content characteristics: word count, formatting richness
Results
- +27% clicks at position 1 — top result is more often the right result
- +9% overall increase in clicked searches
Key Lessons
- When query repetition is low, the user’s behavioral graph is a better training signal than aggregate click logs
- Position bias correction is mandatory before training pairwise LTR — uncorrected data teaches the model to rank popular positions, not relevant documents
- Two-stage ranking lets you put cheap features in Solr and expensive features in app layer — useful when the underlying search engine is a black box
- Work graph features (author affinity, channel priority) are more stable than document-level features for personal communication search
What to Steal
- Work graph / interaction graph as LTR signal: applicable to any enterprise search where users have stable interaction patterns (email, docs, tickets)
- Position bias oversampling trick: simple to implement, high impact on pairwise LTR quality
- Two-stage architecture pattern with stage-appropriate feature sets