Home Depot Product Search Relevance

Overview

A Kaggle competition dataset released by Home Depot for evaluating product search relevance. It provides query–product pairs labeled with crowdsourced relevance scores, focused on the home improvement retail domain.

Dataset Structure

  • Query: customer search query (e.g., “angle bracket”, “oscillating tool blade”)
  • Product: product title and description
  • Relevance score: continuous 1–3 scale from crowdsourced annotation (average of multiple annotators)
  • Domain: home improvement products

Key Characteristics

The 1–3 continuous relevance scale (with decimal values from annotator averaging) differs from the discrete 0–4 grading in typical IR Judgment Lists. This makes it well-suited for regression-based relevance prediction rather than classification.

The home improvement domain has distinctive challenges:

  • Technical terminology (product codes, specifications, measurements)
  • Queries mixing product type + attribute (e.g., “3/4 inch PVC elbow”)
  • User intent often includes installation context not visible in product descriptions

Use Cases

  • Regression models predicting relevance scores
  • Feature engineering experiments (text matching, semantic similarity)
  • Benchmarking retrieval systems in a vertical domain
  • Transfer learning experiments (train on ESCI, evaluate on Home Depot or vice versa)

Comparison with Other Datasets

DatasetDomainScaleLabel type
Amazon ESCI DatasetGeneral e-commerceVery large4-class (ESCI)
WANDS DatasetHome goods~42K pairs3-class
Home DepotHome improvement~74K pairsContinuous 1–3
  • Judgment Lists — this dataset functions as a crowdsourced judgment list
  • NDCG — continuous scores can be discretized for NDCG computation
  • Learning to Rank — standard use case for this dataset
  • Amazon ESCI Dataset — larger counterpart for general e-commerce search evaluation
  • WANDS Dataset — comparable vertically-focused annotation dataset

Source