Building a Better Search Engine for Semantic Scholar
Source: https://medium.com/ai2-blog/building-a-better-search-engine-for-semantic-scholar-ea23a0b661e7 Author: Sergey Feldman (Senior Applied Research Scientist, AI2)
Summary
How AI2 built an ML reranker on top of Elasticsearch for Semantic Scholar (190M+ papers), going from evaluation score 0.7 to 0.93 and +8% clicks per query.
Technical Architecture
- First stage: Elasticsearch retrieval (190M papers indexed)
- Reranking layer: LightGBM with LambdaRank optimization
- Features: 22 engineered features — query text matching across title, abstract, venue, author fields; KenLM language model scoring
- Training data: human-annotated judgment lists
Key Practical Innovations
1. Data Filtering
Removed ~1/3 of training data that failed “sensibility checks” (comparing citation counts, recency, textual matches). Yielded 10-15% evaluation improvement — data quality mattered more than data quantity.
2. Custom Evaluation Metric
Instead of standard NDCG on held-out test set, created manually-annotated benchmark of 250 real queries with component-level criteria (author, venue, year, text). More predictive of production performance than generic metrics.
3. Post-hoc Rule Corrections
Rule-based fixes for known model errors: boost exact year matches, boost quoted phrase matches.
Results
- Custom metric: 0.7 → 0.93
- A/B test: +8% clicks per query, +9% MRR
Lessons
- Evaluation metric design is as important as model design — see Judgment Lists
- Training data filtering beats more data
- Rules + ML beats ML alone
Related Concepts
People
- Doug Turnbull — LambdaMART explainer referenced