PEFT — Parameter-Efficient Fine-Tuning

Umbrella term for fine-tuning techniques that update only a small fraction of model parameters while keeping most weights frozen. The HuggingFace peft library is the standard implementation. Enables domain adaptation of large models without full fine-tuning cost.

Why PEFT

Full fine-tuning a 7B model requires storing optimizer states for 7B parameters (~84 GB for Adam). PEFT methods update 0.1–1% of parameters, making fine-tuning practical on limited hardware and enabling one base model to serve many tasks via swappable adapters.

Main Methods

Method	How it works	Params updated	Best for
LoRA	Low-rank matrices injected alongside frozen weights	~0.1–1%	LLMs, embedding models — dominant method
QLoRA	LoRA + 4-bit base quantization	~0.1–1%	Consumer GPU fine-tuning
Prefix Tuning	Learnable tokens prepended to every layer’s key/value	~0.1%	Sequence generation
Prompt Tuning	Learnable tokens at input only	<0.01%	Large models (≥11B); less effective on small models
IA³	Scale activations with learned vectors	~0.01%	Few-shot; minimal params
Adapter layers	Small bottleneck MLP inserted between transformer layers	~1–3%	Original PEFT approach; slower than LoRA

LoRA dominates in practice — best quality/efficiency trade-off across LLMs and embedding models.

In the Search Pipeline

Stage	PEFT technique	Purpose
Embedding model	LoRA (contrastive)	Domain adaptation of bi-encoder
Query understanding	QLoRA	Fine-tune LLM on domain queries
Reranking	LoRA on cross-encoder	Domain-specific relevance scoring
Answer synthesis (RAG)	QLoRA	Domain vocabulary, format control
Judgment generation	QLoRA	Consistent relevance labels

Relationship to Domain Adaptation

PEFT is the primary mechanism for domain adaptation of neural search components. Before PEFT, domain adaptation required either:

Full fine-tuning (expensive, needs large GPU cluster)
Prompt engineering (limited effect on retrieval quality)
Training from scratch (prohibitive)

With LoRA/QLoRA, a search team with a single GPU can produce a domain-adapted embedding model or LLM reranker in hours.

LoRA — the dominant PEFT method
QLoRA — LoRA + 4-bit quantization
Embedding Fine-tuning — domain adaptation of embedding models; PEFT is the practical path
LLM — primary target for PEFT in search pipelines

Awesome Search KG

Explorer

PEFT

PEFT — Parameter-Efficient Fine-Tuning

Why PEFT

Main Methods

In the Search Pipeline

Relationship to Domain Adaptation

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

PEFT

PEFT — Parameter-Efficient Fine-Tuning

Why PEFT

Main Methods

In the Search Pipeline

Relationship to Domain Adaptation

Related Concepts

Graph View

Table of Contents

Backlinks