Personalization

Definition

Personalization adapts search results to individual user preferences, history, and context — returning different results for different users submitting the same query, based on signals like their browsing history, purchase behavior, location, and demographics.

Why Personalize?

The same query “shoes” means different things to different users:

  • A runner → technical running shoes
  • A parent → children’s shoes
  • A fashion-conscious shopper → trendy sneakers
  • A formal dresser → dress shoes

Generic ranking serves the “average user” — personalization serves the actual user.

Personalization Signals

Signal TypeExamplesFreshness
Explicit preferencesSaved searches, wishlistsStatic
Purchase historyBought brands, price rangeMedium
Browse historyCategories, products viewedFresh
Search historyPrevious queries, clicked resultsFresh
DemographicsAge, gender, locationStatic
Session contextCurrent session queriesReal-time
Social proofFriends’ purchasesMedium

Personalization Approaches

Re-ranking

Run generic retrieval → re-rank based on user preferences.

  • Simple to implement (add user features to Learning to Rank)
  • Doesn’t change candidate set — limited recall personalization

Query Rewriting

Modify the query before retrieval based on user profile:

  • “shoes” → “Nike women’s running shoes” (based on user history)
  • Improves both recall and precision

Embedding-Based Personalization

Airbnb’s approach: learn user embeddings alongside listing embeddings in the same space.

  • Query embedding → user-contextualized embedding
  • Personalized ANN search over listing embeddings
  • Published as “Listing Embeddings in Search Ranking”

Feature Engineering for Personalized LTR

Add user features to the Learning to Rank model:

  • User’s average purchase price range
  • User’s brand affinities
  • User’s category preferences
  • Session recency signals

In-Engine Click Profile Matching (Kleinanzeigen / Vespa)

Kleinanzeigen moved user behavioral profiles entirely inside Vespa, eliminating the external orchestration layer. Click events fire as Vespa document updates, processed in-process by a document processor. Profiles decay over time, are L2-normalized, and are stored as Vespa documents keyed {userId}:category:{categoryId}. At query time: read profile → expand tokens via a relations similarity graph → fire a WAND query. No external profile store. No service hop.

This exemplifies retrieval-personalization colocation: profile maintenance and retrieval in the same platform removes service-boundary latency entirely.

See: From Elasticsearch to Vespa - Rebuilding the Kleinanzeigen Homepage Feed Part 1, Kleinanzeigen - Vespa Migration for Homepage Feed

Privacy Considerations

Personalization requires collecting and using personal data:

  • Compliance: GDPR, CCPA require consent and right to erasure
  • User trust: explicit personalization signals (wishlists) are trusted; opaque behavioral targeting is not
  • Security: personalization profiles are sensitive data — must be protected

Facebook/Meta’s 98 personal data points for ad targeting is the extreme example of what’s possible (and controversial).

Airbnb Case Study

Airbnb’s personalization system (Eugene Yan et al.):

  • Trains user embeddings from booking sequences
  • Listing embeddings from viewing sequences
  • Online: user embedding → approximate nearest neighbor search over listings
  • Result: +21% booking lift in A/B test

The key insight: user preferences are encoded in their past interactions, not in their profile form. Embeddings capture implicit preferences.

People

Elastic Governed Personalization (Ecommerce)

Two mechanisms extending a governance control plane without replacing it:

  1. Purchase history boosting — query user_purchases index in parallel with governance; logarithmic frequency + exponential decay recency weights; governance still controls what appears
  2. Cohort-aware policy activation — cohort policies stored in same policy engine; terms filter {"cohorts": ["_all", "vegan"]} narrows candidate policy set

Ordering (innermost → outermost)

  1. Base query (keyword/semantic)
  2. Governance policy layer (hard filters + soft boosts)
  3. Business-signal boosts (margin, popularity)
  4. Purchase history boosts (outermost)

Articles

People (additional)