Reading Your Notebook 05 Numbers
Here's what students typically get on a Colab CPU:
model accuracy size_kb latency_ms
0 fp32 0.955 70.0 0.45
1 int8 0.958 22.0 2.14
"INT8 is more accurate?!" — random noise. The FP32 → INT8 rounding shifted
one or two test predictions, and on a 360-sample test set that's ±0.3%. Run
the cell again with a different seed and the order will flip. The honest read
is: same accuracy.
"INT8 is slower?!" — yes, on this size of model. The matmul is on a
(64 × 128) weight matrix; the per-call overhead of "look at the activations,
pick a scale, quantize" costs more than the matmul saves. The speed win arrives
once the matrices are 1000× bigger, which is exactly the regime real LLMs live in.
Run now — Notebook 05 to see this on your machine.
Open in Colab