Maximal Marginal Relevance for Keyphrase Extraction

The problem

Keyphrase extractors (TextRank, RAKE, POS tagging) produce redundant results. “Good Product,” “Great Product,” “Nice Product,” “Excellent Product” all rank highly but convey the same information — wasting limited display space.

Solution 1: Cosine similarity filtering

Remove phrases with cosine similarity above threshold (e.g., 0.9). Requires manual threshold adjustment, may miss similar phrases below cutoff.

Solution 2: MMR re-ranking

MMR score = λ × Sim(phrase, document) − (1−λ) × max Sim(phrase, previously_selected_phrases)

λ = 0.5: optimal balance between diversity and accuracy
λ → 1: prioritize relevance
λ → 0: prioritize diversity

The algorithm selects keyphrases based on both query relevance and novelty — “the degree of dissimilarity between the document being considered and previously selected ones.”

Result

Top N keyphrases provide meaningful variety. Similar phrases are ranked far apart, eliminating the clustering problem where redundant terms dominate results.

People

Aditya Kumar

Awesome Search KG

Explorer

Maximal Marginal Relevance to Re-rank results in Unsupervised KeyPhrase Extraction

Maximal Marginal Relevance for Keyphrase Extraction

The problem

Solution 1: Cosine similarity filtering

Solution 2: MMR re-ranking

Result

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Maximal Marginal Relevance to Re-rank results in Unsupervised KeyPhrase Extraction

Maximal Marginal Relevance for Keyphrase Extraction

The problem

Solution 1: Cosine similarity filtering

Solution 2: MMR re-ranking

Result

Related Concepts

People

Graph View

Table of Contents

Backlinks