Metadata: The 3rd Kind of Retrieval

Author: Doug Turnbull

Summary

Search discussions focus on lexical vs. embedding retrieval. But there’s a third retrieval philosophy that’s often overlooked: metadata retrieval — ranking based on structured attribute matching. With LLMs, it’s now easier than ever to extract attributes from queries and implement this approach.

The Three Kinds of Retrieval

  1. Lexical — term matching (BM25)
  2. Embedding — semantic similarity
  3. Metadata — structured attribute matching

How Metadata Retrieval Works

A query like "crimson suede couch" can be parsed into structured attributes:

ColorMaterialCategory
RedLeather > SuedeFurniture > Living Room > Couches

Products are then ranked by how closely their attributes match the specification:

ProductColorMaterialCategoryRank
Red suede sofaRedSuedeFurniture > Living Room > Couches1st
Pink barbie reclinerPinkImitation leatherRecliners2nd
Mr. Beast neon deskGreenWoodOffice > DesksExcluded

Each attribute has its own similarity function (e.g., color similarity uses primary/secondary color relationships).

Why Metadata Retrieval Matters

Explainable Ranking

Metadata retrieval produces ranking you can have a conversation about:

  • “Would imitation leather be acceptable for a suede search?”
  • “Should ankle-length leggings rank above calf-length?”

This turns ranking from an abstract black-box into something testable and stakeholder-communicable.

Testable Without Extensive User Evals

Attribute-based ranking can be unit-tested:

  • “These tangential financial reports may show below earnings reports, but it must be about company XYZ”
  • “Show ankle-length first, then calf-length, exclude knee-length”

You don’t need user behavior data to discover what’s broken — stakeholders can tell you directly.

LLMs Make It Easy Now

Query-to-attribute extraction used to require extensive NLP research. LLMs now make it straightforward — classifying queries and content into attributes without specialized models.

Doug notes a link to Agentic Search — with metadata retrieval, many semantic-seeming problems become solvable through attribute-based approaches, without needing dense embeddings. This may reduce the reliance on sophisticated retrieval in certain domains.

People