Hiring for Search

Roles, skills, and evaluation criteria for building a search team. Search is a specialized enough domain that generic engineering hiring criteria miss what matters.

Roles on a Search Team

Relevance Engineer / Search Engineer

The core role. Owns the bridge between IR fundamentals and production systems.

Key skills:

Information retrieval foundations: BM25, inverted indexes, TF-IDF, precision/recall
Ranking system design: two-stage retrieval, LTR pipeline, feature engineering
Query analysis: query understanding, tokenization, synonym handling
Evaluation: building and running offline eval; interpreting NDCG, MAP, MRR
Search engine internals: Elasticsearch / Solr / Vespa / OpenSearch configuration and tuning

What separates good from great: ability to form a hypothesis about why a query fails, design a measurement to test it, and implement the fix — without breaking unrelated queries.

ML Engineer (Ranking / Retrieval)

Owns the model layer: learning-to-rank models, embedding models, re-rankers.

Key skills:

Feature engineering for ranking (query-document features, user context features)
Training pipelines: offline data prep, label collection, train/eval splits
Model serving: latency-conscious inference, ONNX export, model distillation
Vector search: embedding models, HNSW, quantization
Implicit feedback: click models, position bias correction, counterfactual learning

What separates good from great: understanding of the full data flywheel — how model decisions affect user behavior, which affects training data, which affects the next model.

Search Data Scientist

Owns the measurement layer.

Key skills:

Experiment design: power analysis, randomization, statistical significance
Click modeling: position bias correction, interleaving analysis
Query log analysis: identifying patterns in head/torso/tail query distribution
Business metric definition: tying search changes to revenue, retention, task completion

What separates good from great: skepticism about easy wins; ability to detect when an A/B “win” is a measurement artifact rather than a real improvement.

Search Product Manager

The rarest and hardest to hire.

Key skills:

Technical depth: can read a ranking formula, understand precision/recall tradeoffs, interpret eval results
Measurement orientation: treats every change as an experiment; comfortable saying “we don’t know yet”
Stakeholder translation: can explain why a relevance regression is a business problem to execs
Roadmap strategy: balances quick wins (head query fixes) with compound investments (infrastructure, models)

What separates good from great: ability to resist the pressure to ship features without measurement.

UX Researcher

Owns qualitative signal: user intent studies, usability testing, task analysis.

Provides the ground truth that quantitative metrics can’t — why users behave the way they do. Not always considered a “search role” but fills the gap when metrics look fine and users are still complaining.

When you need one: entering a new domain where intent patterns are unknown, or when engagement metrics and satisfaction surveys diverge.

Relevance Annotator / Judgment Analyst

Produces and maintains Judgment Lists — human-labeled relevance assessments that power offline evaluation.

Often treated as a commodity task, but annotation quality directly determines the quality of your eval harness. Annotators with domain knowledge are a compounding asset; bad annotations make your offline eval misleading.

Key skill: calibration — consistent, explainable ratings across a large query set. Annotation guidelines matter more than raw annotator quality.

What to Look For in Candidates

Evaluation Mindset

The single most important trait across all search roles. Candidates who have never thought about how to measure whether their changes worked are a red flag regardless of technical depth.

Interview signal: “Walk me through how you would know if a ranking change improved user experience.”

IR Fundamentals

Surprisingly rare even among experienced engineers. Many candidates have worked with search systems without understanding why they work.

Interview signal: “Explain BM25 and why it works better than raw term frequency for ranking.” Or: “What’s the difference between recall and precision, and when do you optimize for each?”

Debugging Instinct

Search failures are often subtle. A good search engineer can look at a failing query and reason about why — was it a tokenization issue, a missing synonym, a feature weight, a training data artifact?

Interview signal: present a real failing query from your system (anonymized if needed) and ask them to walk through their debugging process.

Scale Awareness

Solutions that work on 10k documents fail differently than at 100M. Candidates should have intuition about the scaling properties of the systems they propose.

Common Hiring Mistakes

Hiring generic software engineers and expecting them to learn search. IR is a specialized field. The ramp time for someone with no background is 6-12 months before they’re net positive on relevance work.

Prioritizing ML credentials over IR fundamentals. A PhD in ML who has never thought about recall or precision will spend months rebuilding intuition that an experienced relevance engineer already has.

Not testing for evaluation mindset. Engineers who haven’t worked with eval harnesses will default to eyeballing queries, which doesn’t scale.

Undervaluing domain expertise. For vertical search (medical, legal, e-commerce), domain knowledge is a multiplier on technical skill. A relevance engineer who understands product catalogs will outperform a better engineer who doesn’t.

Skipping the search PM hire. Teams that have no search PM default to engineering-driven roadmaps, which tend toward infrastructure at the expense of user-facing quality.

Seniority and Sequencing

For a team building from scratch, suggested hiring order:

Senior relevance engineer — sets evaluation culture, architectural decisions, standards
Search PM — without this, the team will lack strategic direction
ML engineer — once you have an eval harness and a baseline, you can iterate on models
Data scientist — once you have enough traffic to run experiments
Junior engineers — after you have enough structure to onboard and mentor

Hiring data scientists before you have an eval harness is premature — there’s nothing for them to measure.

Interview Process Recommendations

Take-home relevance task: give a small document corpus, a set of queries, and ask them to improve rankings with any approach they choose. Evaluate methodology, not just output.
System design: “Design a search system for [your domain] from scratch — what are the components and tradeoffs?”
Failure case walkthrough: review a real or synthetic failing query together and ask for reasoning
Metrics discussion: “What metric would you use to evaluate this change, and why?”

Managing a Search Team — how to structure the team once hired
Economics of Search — headcount is a major cost driver
Daniel Tunkelang — canonical writing on search team composition and leadership

Awesome Search KG

Explorer

Hiring for Search

Hiring for Search

Roles on a Search Team

Relevance Engineer / Search Engineer

ML Engineer (Ranking / Retrieval)

Search Data Scientist

Search Product Manager

UX Researcher

Relevance Annotator / Judgment Analyst

What to Look For in Candidates

Evaluation Mindset

IR Fundamentals

Debugging Instinct

Scale Awareness

Common Hiring Mistakes

Seniority and Sequencing

Interview Process Recommendations

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Hiring for Search

Hiring for Search

Roles on a Search Team

Relevance Engineer / Search Engineer

ML Engineer (Ranking / Retrieval)

Search Data Scientist

Search Product Manager

UX Researcher

Relevance Annotator / Judgment Analyst

What to Look For in Candidates

Evaluation Mindset

IR Fundamentals

Debugging Instinct

Scale Awareness

Common Hiring Mistakes

Seniority and Sequencing

Interview Process Recommendations

Related

Graph View

Table of Contents

Backlinks