LoRA — Low-Rank Adaptation

Parameter-efficient fine-tuning technique. Hu et al., 2021 (arXiv:2106.09685). Instead of updating all weights of a pre-trained model, LoRA injects small trainable low-rank matrices alongside the frozen originals. The dominant approach in PEFT.

Core Idea

For a frozen weight matrix W ∈ R^(d×k), LoRA parameterises the update as:

W' = W + ΔW = W + A · B
where A ∈ R^(d×r),  B ∈ R^(r×k),  r << min(d, k)

W is frozen — never updated
A and B are trained — they hold the adaptation
r (rank) is the key hyperparameter — typically 4–64; lower = fewer params, less capacity

At inference the adapter can be merged into W with no overhead: W_merged = W + A·B. Or kept separate and swapped per task.

Why It Works

Full fine-tuning updates O(d×k) parameters. LoRA updates O((d+k)×r) — orders of magnitude fewer when r is small. Despite this, the update matrix ΔW = A·B can represent the relevant adaptations because weight updates in over-parameterised models tend to have low intrinsic rank.

Practical Parameters

Param	Meaning	Typical values
`r`	Rank	4, 8, 16, 32, 64
`alpha`	Scaling factor (ΔW scaled by alpha/r)	16, 32
`target_modules`	Which layers get adapters	Q, K, V projections; sometimes FFN
`dropout`	Adapter dropout	0.05–0.1

Applications in Search

Use case	What’s adapted	Notes
Embedding fine-tuning	Bi-encoder model	LoRA on Q/K/V; domain adaptation without full retraining
LLM query rewriting	Instruction-tuned LLM	Teach the model domain vocabulary
Intent/judgment generation	LLM	Fine-tune for consistent label format
RAG synthesis	LLM	Domain-specific answer style

For embedding models specifically: LoRA fine-tuning with contrastive loss achieves comparable recall to full fine-tuning at a fraction of the compute. See Fine-Tuning Qwen3 Embeddings for Product Category Classification (LoRA on 615M-param model, 0.836 macro-F1).

vs. Full Fine-tuning

	Full fine-tuning	LoRA
Params updated	All (billions)	~0.1–1%
Memory	Huge (optimizer states for all params)	Small
Training time	Long	Fast
Quality	Best	Near-equivalent on most tasks
Multi-task	One model per task	One base + many adapters

QLoRA — LoRA + 4-bit quantization of base model; enables fine-tuning on consumer GPUs
PEFT — umbrella term; LoRA is the dominant PEFT method
Embedding Fine-tuning — domain adaptation of embedding models; LoRA is increasingly the standard approach
LLM — primary target for LoRA in search pipelines

Awesome Search KG

Explorer

LoRA

LoRA — Low-Rank Adaptation

Core Idea

Why It Works

Practical Parameters

Applications in Search

vs. Full Fine-tuning

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

LoRA

LoRA — Low-Rank Adaptation

Core Idea

Why It Works

Practical Parameters

Applications in Search

vs. Full Fine-tuning

Related Concepts

Graph View

Table of Contents

Backlinks