LoRA — Low-Rank Adaptation

Parameter-efficient fine-tuning technique. Hu et al., 2021 (arXiv:2106.09685). Instead of updating all weights of a pre-trained model, LoRA injects small trainable low-rank matrices alongside the frozen originals. The dominant approach in PEFT.

Core Idea

For a frozen weight matrix W ∈ R^(d×k), LoRA parameterises the update as:

W' = W + ΔW = W + A · B
where A ∈ R^(d×r),  B ∈ R^(r×k),  r << min(d, k)
  • W is frozen — never updated
  • A and B are trained — they hold the adaptation
  • r (rank) is the key hyperparameter — typically 4–64; lower = fewer params, less capacity

At inference the adapter can be merged into W with no overhead: W_merged = W + A·B. Or kept separate and swapped per task.

Why It Works

Full fine-tuning updates O(d×k) parameters. LoRA updates O((d+k)×r) — orders of magnitude fewer when r is small. Despite this, the update matrix ΔW = A·B can represent the relevant adaptations because weight updates in over-parameterised models tend to have low intrinsic rank.

Practical Parameters

ParamMeaningTypical values
rRank4, 8, 16, 32, 64
alphaScaling factor (ΔW scaled by alpha/r)16, 32
target_modulesWhich layers get adaptersQ, K, V projections; sometimes FFN
dropoutAdapter dropout0.05–0.1
Use caseWhat’s adaptedNotes
Embedding fine-tuningBi-encoder modelLoRA on Q/K/V; domain adaptation without full retraining
LLM query rewritingInstruction-tuned LLMTeach the model domain vocabulary
Intent/judgment generationLLMFine-tune for consistent label format
RAG synthesisLLMDomain-specific answer style

For embedding models specifically: LoRA fine-tuning with contrastive loss achieves comparable recall to full fine-tuning at a fraction of the compute. See Fine-Tuning Qwen3 Embeddings for Product Category Classification (LoRA on 615M-param model, 0.836 macro-F1).

vs. Full Fine-tuning

Full fine-tuningLoRA
Params updatedAll (billions)~0.1–1%
MemoryHuge (optimizer states for all params)Small
Training timeLongFast
QualityBestNear-equivalent on most tasks
Multi-taskOne model per taskOne base + many adapters
  • QLoRA — LoRA + 4-bit quantization of base model; enables fine-tuning on consumer GPUs
  • PEFT — umbrella term; LoRA is the dominant PEFT method
  • Embedding Fine-tuning — domain adaptation of embedding models; LoRA is increasingly the standard approach
  • LLM — primary target for LoRA in search pipelines