Interactive Explainer
Object Detection, with a Real Detector
A pretrained SSD-MobileNetV2, trained on the full COCO dataset (80 classes), loads in your browser and runs live on whatever photo you feed it. You draw ground-truth boxes, it predicts, and every metric on this page—IoU, precision, recall, F1, non-max suppression, average precision—is computed against the real detector's real output.
A real detector running in your browser
The model loaded below is SSD (Single-Shot Detector) with a MobileNetV2 backbone, trained on MS-COCO to detect 80 everyday classes (person, dog, car, laptop, bottle, …). It runs entirely on your GPU via TensorFlow.js— no network calls once the model is cached. Everything you see in the rest of the article is computed from its real predictions on your photo.
Pick a photo Loading detector…
Six CC-licensed stock photos chosen for COCO coverage.
A detection is four numbers plus a label
Every entry in the model's output list looks like this:
The box is given in pixel coordinates on your image. The class is one of COCO's 80 categories. The score is the model's confidence. Below is the live list of every detection above a 5% confidence threshold—that's usually everything the SSD proposes. Click a row to highlight that box on the image.
Raw output
| # | class | score | box |
|---|---|---|---|
| Waiting for model… | |||
Intersection over Union (IoU)
The single most important number in detection. IoU measures how much two boxes overlap, scaled so that identical boxes score 1 and disjoint boxes score 0:
Below, you draw the ground truth. Click the Draw GT button, drag a rectangle around whichever object you want to evaluate, and release. The nearest detection is matched automatically and the IoU pops out. Draw as many as you like. Click Clear to reset.
Confidence thresholding
The detector emits dozens of low-confidence guesses alongside its good ones. In production, you pick a score threshold; only detections above the bar survive. Slide the threshold and watch boxes appear and vanish.
Precision, Recall, F1 vs your ground truth
If you've drawn any GT boxes in Step 2, we match them to the surviving detections by IoU ≥ 0.5 (classical COCO TP rule). Unmatched GTs are false negatives; unmatched detections are false positives.
Non-Max Suppression (NMS)
Detectors fire on the same object many times. NMS is the classical deduplication rule:
- Sort detections by confidence, highest first.
- Keep the top detection.
- Discard every other detection whose IoU with the kept one exceeds the NMS threshold.
- Repeat on the remaining detections.
COCO-SSD already applies its own NMS internally. To see NMS in
action, we load the detector with maxNumBoxes: 40 and
an internal IoU threshold of 1.0—meaning nothing
is suppressed at the model level. You then apply NMS yourself below.
NMS trace
| step | action | class | score | against |
|---|---|---|---|---|
| Waiting… | ||||
Detector families
The SSD running on this page is one point in a bigger design space. The three dominant families:
| Family | How it proposes boxes | Representatives | Gotcha |
|---|---|---|---|
| Anchor-based (this page) | Dense grid of prior boxes at many scales / aspect ratios; each predicts class + offsets. | SSD, Faster R-CNN, RetinaNet, YOLOv3/v4/v5 | Needs NMS. Anchor hyperparameters are sensitive. |
| Anchor-free / point-based | Per pixel (or cell): predict "am I a centre?" + box extents. | FCOS, CenterNet, YOLOX, YOLOv8 | Assignment loss is trickier; still typically uses NMS. |
| Set-based (DETR style) | Fixed set of learned queries; Hungarian matching against GT. | DETR, Deformable DETR, DINO | NMS-free. Trains slowly; originally weak on small objects. |
Precision–Recall curve and AP
A single confidence threshold picks one point on a precision-recall curve. Benchmarks want the whole curve. If you've drawn GT boxes, we sweep the threshold from 1 down to 0, plot precision against recall, and integrate to get Average Precision (AP).
Average Precision on this photo
Area under the PR curve computed from your drawn GT boxes. Usually improves when you draw GT around visually clear objects and shrinks on crowded or occluded scenes.
Four things that still get people
"A 99% confidence means the model is right."
Modern detectors are badly calibrated. Upload a photo with a
borderline case and you'll see SSD spike to 0.9 on plausibly
wrong boxes. Use the PR curve, not the raw number.
"NMS is just cleanup."
On crowded scenes, classic NMS deletes real duplicate
objects. Try a photo with two cats sitting close and watch
one box get killed. That's Soft-NMS / DETR territory.
"mAP summarises everything."
Maybe you want 100% recall at any precision (medical scan), or
100% precision at any recall (autopilot). Always plot the full
PR curve before picking a model.
"A bigger box is a safer box."
A prediction that strictly contains the truth has
IoU = truth/pred. Huge "safety" boxes tank IoU. Tight
wins.