Pruning — The Jenga Approach
Remove weights that barely contribute. Like Jenga: remove blocks that aren't structurally important.
Before pruning: [0.8, 0.001, -0.7, 0.002, 0.9, -0.003]
After pruning: [0.8, 0, -0.7, 0, 0.9, 0 ]
Set near-zero weights to exactly zero. The model still works.
Typical result: 50–90% of weights pruned with < 1% accuracy loss.
Why it helps: Zeros compress extremely well on disk.
Important Caveat: It does not make inference faster unless the hardware/framework specifically supports sparse matrix multiplication. Otherwise, the GPU still does the math (multiplying by zero).