Amazon ESCI Dataset

Overview

The Shopping Queries Dataset is a large-scale e-commerce search benchmark released by Amazon. It provides human-annotated query–product pairs labeled with ESCI relevance grades, designed to improve and evaluate product search systems.

ESCI Relevance Scale

Each (query, product) pair is labeled with one of four grades:

GradeLabelMeaning
EExactThe product directly matches the query
SSubstituteThe product could substitute for the intent, but isn’t an exact match
CComplementThe product complements (goes with) the query intent
IIrrelevantThe product is not relevant to the query

This 4-class schema is richer than binary relevance and captures typical e-commerce nuances (e.g., accessories, near-misses).

Key Facts

  • Large scale: hundreds of thousands of labeled pairs
  • Multi-locale: covers English, Japanese, and Spanish
  • Product metadata: titles, descriptions, bullets, product type
  • Designed for ranking, classification, and retrieval tasks

Supported Tasks

  1. Query–Product Ranking — rank products by relevance for a given query
  2. Query–Product Classification — predict the ESCI label for a (query, product) pair
  3. Product Substitute Identification — identify substitute products from the S-labeled pairs

Use in Search Evaluation

The ESCI dataset serves as a public judgment list for benchmarking retrieval and ranking models. It’s commonly used to evaluate:

Because it uses multi-class labels (not binary), it supports NDCG evaluation natively.

See ESCI-S Dataset for extended metadata built on top of this dataset.

Source