Fine-Tuning LLMs: From Zero to Hero with Python & Ollama
Practical tutorial: fine-tune a small LLM (Phi-3 Mini) using Unsloth (2× training speedup), export to GGUF format, and serve locally via Ollama.
Key Steps
- Dataset: 20–500 input-output pairs in JSON format
- Setup: Google Colab T4 GPU (free); install
unsloth,trl,peft - Model: Load
phi-3-mini-4k-instructin 4-bit via Unsloth - LoRA:
r=16, target all projection layers viaget_peft_model - Train:
SFTTrainer,max_steps=60,learning_rate=2e-4 - Export:
save_pretrained_gguf→ GGUF format for Ollama - Serve:
ollama create model-name -f Modelfile→ollama run model-name
Use Cases for Search
- Fine-tune for structured JSON output from search queries (consistent extraction)
- Domain-specific query classification
- On-premise deployment where privacy prevents using external APIs
Key Tips
- Temperature 0.1–0.3 for consistent deterministic outputs
- 100+ diverse examples prevents overfitting to format
- Monitor training loss; should decrease steadily
- Unsloth is significantly faster than vanilla HuggingFace training