Interactive Explainer

LoRA · Low-Rank Adapter Playground

Full fine-tuning of a 70B model touches 70 billion parameters. LoRA touches 200 million. The quality is often indistinguishable, and the checkpoint is 300× smaller. Slide the rank, and see the numbers move.

~7 minDeep Learning · LoRA · Fine-tuning

The LoRA insight (Hu et al. 2021): the change you want to apply to a pretrained weight is typically low-rank. You don't need 70B parameters of update budget — you need a tiny correction. So instead of fine-tuning W, fine-tune two small matrices B and A with W + BA, keeping W frozen.

The playground

Base model:

LoRA rank r: 8

Method	Trainable params	Ratio vs full	Adapter size on disk
Full fine-tune	–	1.00×	–
LoRA · r=8	–	–	–
QLoRA (4-bit base + r=8)	–	–	–

The magic: LoRA adapters at r=8 for a 7B model are ~4M trainable parameters. You can fine-tune on a single consumer GPU (24 GB RAM), save the adapter as a ~16 MB file, and ship it alongside the base weights.

The math

y = W₀x + B · A · x rank(BA) ≤ r

W₀ is frozen. A ∈ ℝ^{d×r} initialized Gaussian; B ∈ ℝ^{r×d} initialized zero (so at t=0, the LoRA term is the identity modification — zero). Only A and B receive gradients.

PyTorch + peft

from peft import LoraConfig, get_peft_model

lora_cfg = LoraConfig(
    r=8,                           # rank
    lora_alpha=16,                 # scale
    target_modules=["q_proj", "v_proj"],  # where to inject
    lora_dropout=0.05,
    bias="none",
)
model = get_peft_model(base_model, lora_cfg)
model.print_trainable_parameters()
# > trainable params: 4.2M   all params: 7B   trainable%: 0.06%

QLoRA · make it even smaller

QLoRA (Dettmers 2023) quantizes the frozen base to 4-bit (NF4), then trains LoRA adapters in fp16 on top. You can now fine-tune a 70B model on a single 48 GB GPU. Same quality as LoRA at a fraction of the memory.

Part of the ES 667 Deep Learning course · IIT Gandhinagar · Aug 2026.