Symmetric vs. Asymmetric Semantic Search

A critical distinction that determines which embedding model to use.

Query and corpus entries are about the same length with the same amount of content.

Example: finding similar questions

  • Query: “How to learn Python online?”
  • Match: “How to learn Python on the web?”

You could flip query and corpus entries and the task still makes sense.

  • Training example: Quora Duplicate Questions
  • Suitable models: Pre-Trained Sentence Embedding Models

A short query (question or keywords) matched against a longer paragraph that answers it.

Example:

  • Query: “What is Python”
  • Match: “Python is an interpreted, high-level and general-purpose programming language…”

Flipping query and corpus usually does not make sense.

  • Training example: MS MARCO
  • Suitable models: Pre-Trained MS MARCO Models

Key rule

Choose the right model for your type of task. Using a symmetric model for asymmetric search (or vice versa) significantly degrades retrieval quality.