Search Problem Archetypes

A diagnostic framework for naming the search problem you actually have before choosing a solution. The premise — drawn from Atita Arora’s Before You Fix Your Search, Know What’s Actually Broken and Russell Ackoff’s systems thinking — is that most search problems are not unique: they are instances of a small number of recurring patterns. The most valuable thing you can do before investing in LTR, vector search, or a new model is to correctly identify which pattern you are in, because the archetype tells you which layer to fix first (data model vs. ranking vs. evaluation vs. architecture).

Why Diagnosis Comes Before Solution

Teams self-diagnose with comfortable, sophisticated-sounding answers — “we need better ranking”, “we need vector search”, “we need a better model”. But the problem definition is usually a symptom, not a diagnosis. The most common foundation failure: a team has built sophisticated retrieval but never formally defined relevance — no evaluation framework, no judgment lists, no shared definition of a good result. The model optimizes a signal nobody verified maps to user satisfaction. This is a measurement / alignment problem, and no ranking sophistication fixes it.

“We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem.” — Russell Ackoff

The 10 Archetypes

#	Archetype	Core tension	Typical domains	Fix-first layer
1	Uniqueness Problem	One-of-a-kind inventory; freshness existential; thin demand signals; vocabulary mismatch	Marketplaces, pre-owned, handmade, classifieds	Data modeling
2	Complexity Machine	Variant-heavy catalog; mixed known-item + exploratory intent; sponsored vs. organic	Retail, fashion, health, price comparison	Intent routing / ranking
3	Precision Mandate	Plausible-but-wrong is worse than nothing; provenance & scope correctness first-class	Legal, clinical, tax, regulatory	Precision, lineage, scope
4	Firehose	Recency vs. relevance tug-of-war; authority explicit; unpredictable volume	News, media, social, live events	Freshness & authority signals
5	Extraction	Consumer is a machine/agent/pipeline; structured extraction at low latency; schema drift	Financial news→events, feeds, research pipelines	Extraction precision / latency
6	Media Vault	Non-text catalog (image/video/audio); sparse/missing metadata; conceptual/mood queries	Stock media, DAM, archives, video	Metadata / multimodal layer
7	Knowledge Graph	Answer composite across documents & entity relationships; taxonomy/ontology quality	Procurement, talent networks, compliance, gastronomy	Ontology / entity modeling
8	Geospatial	Distance, proximity, bounding box, polygon as first-class relevance	Satellite/aerial, urban planning, mapping, govt	Spatial indexing / architecture
9	Q-commerce	Real-time relevance under operational constraints (delivery window, availability, geo)	Food delivery, ride-hailing, on-demand, logistics	Supply/operational alignment
10	Code Search	Inherits KG + precision + firehose + extraction; vocabulary mismatch; version correctness	Coding agents, version control, API docs, code review	Depends on inherited pattern

How to Know Which One You’re In (selected tells)

Uniqueness — standard taxonomies don’t fit (“handcrafted ceramic bowl” ≠ “kitchenware”); an item sold is gone.
Complexity Machine — Master/Variant SKUs; a single pipeline tuned for neither known-item nor exploratory intent.
Precision Mandate — recall isn’t enough; wrong jurisdiction / wrong revision / wrong rate range is a hard fail.
Firehose — most relevant doc is years old while the most recent has no topical reference; query volume spikes on events.
Extraction — traditional relevance evals don’t apply; failure to extract has compounding downstream consequences.
Media Vault — asset names are random/numeric with no relation to content; lexical layer still doing most of the work.
Knowledge Graph — users browse more than search; nested relationships (supplier → product → certification → regulation).
Geospatial — bounding box / radius / polygon queries not served by standard search; coordinate-system errors at query time.
Q-commerce — supply-side fulfilment confused with query-side sophistication; substitutes treated as edge case.
Code Search — query intent ambiguous (file? function? class? call chain?); conn_rety / retryConnection / retry_on_connection_error.

Operating Principles

Same symptom, different meaning. Zero results in e-commerce (Uniqueness / Complexity Machine) is wasted real estate — show substitutes, never a dead end. In a Precision Mandate, zero results is often better than a marginally-similar regulation or precedent. Same metric on the same dashboard, opposite prescription.
Systems sit between archetypes and outgrow them. A small marketplace landing a big retail partner suddenly inherits variants, suppliers, and sponsored-products problems — a shift of archetype, so the diagnosis should shift too.
The archetype names which layer to fix first, not how. Reranking won’t fix data-modeling gaps; a “better” embedding won’t fix sponsored-vs-organic business misalignment. Sequence matters; most teams skip to solutions before the problem is discovered.
Measurable ≠ useful. Just because a metric is measurable doesn’t mean it’s valid for your case.

Agentic Retrieval Is Not an Archetype

Agentic retrieval is a consumption pattern that sits on top of whichever archetype the system already belongs to (a financial-research agent → Extraction; a legal-review agent → Precision Mandate). What changes is the evaluation contract: a wrong result a human would catch and ignore becomes a wrong action an agent executes autonomously. Precision, result confidence, and scope correctness become hard requirements, not nice-to-haves.

E-commerce Search — spans the Uniqueness Problem and the Complexity Machine
Enterprise Search — frequently a Knowledge Graph or Precision Mandate problem
Relevance Program Setup — building the evaluation foundation whose absence drives most misdiagnoses
Conversational and Agentic Search — agentic retrieval as a consumption pattern over archetypes

Search Evaluation · Judgment Lists — the foundation most teams skip
Zero Results — canonical “same metric, opposite prescription” example
Knowledge Graph Search · Precision and Recall · Search Intent · Query Types

Articles

Before You Fix Your Search, Know What’s Actually Broken — Atita Arora; source of this framework
Search - Intent Not Inventory — complementary “name the real problem” framing
Deconstructing E-Commerce Search - The 12 Query Types — query-level taxonomy within the e-commerce archetypes

People

Atita Arora — author of the archetype framework
Udi Manber — “search is essentially a solved problem” misperception

Awesome Search KG

Explorer

Search Problem Archetypes

Search Problem Archetypes

Why Diagnosis Comes Before Solution

The 10 Archetypes

How to Know Which One You’re In (selected tells)

Operating Principles

Agentic Retrieval Is Not an Archetype

Articles

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Search Problem Archetypes

Search Problem Archetypes

Why Diagnosis Comes Before Solution

The 10 Archetypes

How to Know Which One You’re In (selected tells)

Operating Principles

Agentic Retrieval Is Not an Archetype

Related Topics

Related Concepts

Articles

People

Graph View

Table of Contents

Backlinks