Linear Score Combination

Definition

Linear Score Combination is a hybrid search fusion method that merges ranked lists from multiple retrieval systems by computing a weighted sum of their normalized scores:

score(d) = α · score_dense(d) + (1 - α) · score_sparse(d)

Where α ∈ [0, 1] controls the balance between dense (semantic) and sparse (lexical) signals.

Contrast with RRF

Reciprocal Rank FusionLinear Score Combination
InputRanks onlyRaw scores
Score calibration neededNoYes — scores must be normalized
Sensitive to score magnitudeNoYes
Tunable weightNo (k parameter only)Yes (α per retriever)
RobustnessHigh — works out of the boxLower — requires careful normalization
Performance ceilingGoodCan be higher if scores are well-calibrated

Score Normalization

Raw scores from different systems are not comparable. Common normalization strategies before combining:

  • Min-max normalization: (score - min) / (max - min) — maps to [0, 1] but sensitive to outliers
  • Z-score: (score - mean) / std — normalizes distribution shape
  • Softmax: turns scores into probabilities — preserves relative ordering

The choice of normalization significantly affects fusion quality.

Tuning α

α is a hyperparameter typically tuned on a held-out validation set using an offline metric (NDCG, MRR). Common starting point: α = 0.5 (equal weight). In practice:

  • Higher α → more semantic signal (better for paraphrase/intent matching)
  • Lower α → more lexical signal (better for exact term, product code, name queries)

Some systems learn α per query type using Search Intent classification.

When to Use vs. RRF

  • Use RRF when you can’t normalize scores reliably or want a robust no-tuning baseline
  • Use Linear Combination when scores are well-calibrated and you need fine-grained control over the semantic/lexical tradeoff