Technique	Size Reduction	Speed Up	Accuracy Loss
Quantization	4x smaller	2-4x faster	< 1%
Pruning	2x smaller	1.5-2x faster	1-2%
Distillation	Varies	Varies	Can match original

Quantization Example

Before (Float32):

weights = [0.234, -0.567, 0.891, ...]  # 32 bits each
model_size = 100 MB

After (Int8):

weights = [45, -127, 95, ...]  # 8 bits each
model_size = 25 MB  # 4x smaller!

The math:

Model	Size	Top-1 Accuracy	Latency
MobileNet-v2	3.4 MB	71.8%	30 ms
EfficientNet-B0	5.3 MB	77.1%	45 ms
SqueezeNet	1.2 MB	57.5%	25 ms

Edge Deployment & Model Optimization