← Explainer Library

Interactive Explainer

Object Detection, with a Real Detector

A pretrained SSD-MobileNetV2, trained on the full COCO dataset (80 classes), loads in your browser and runs live on whatever photo you feed it. You draw ground-truth boxes, it predicts, and every metric on this page—IoU, precision, recall, F1, non-max suppression, average precision—is computed against the real detector's real output.

Prelude

A real detector running in your browser

The model loaded below is SSD (Single-Shot Detector) with a MobileNetV2 backbone, trained on MS-COCO to detect 80 everyday classes (person, dog, car, laptop, bottle, …). It runs entirely on your GPU via TensorFlow.js— no network calls once the model is cached. Everything you see in the rest of the article is computed from its real predictions on your photo.

Pick a photo Loading detector…

Six CC-licensed stock photos chosen for COCO coverage.

Loading… the detector will draw its predictions here as soon as the model is ready and a photo is loaded.
Raw detections
Inference time
Detector SSD-MobileNetV2
Step 1

A detection is four numbers plus a label

Every entry in the model's output list looks like this:

The box is given in pixel coordinates on your image. The class is one of COCO's 80 categories. The score is the model's confidence. Below is the live list of every detection above a 5% confidence threshold—that's usually everything the SSD proposes. Click a row to highlight that box on the image.

Click any detection to highlight it.

Raw output

# class score box
Waiting for model…
Step 2

Intersection over Union (IoU)

The single most important number in detection. IoU measures how much two boxes overlap, scaled so that identical boxes score 1 and disjoint boxes score 0:

Below, you draw the ground truth. Click the Draw GT button, drag a rectangle around whichever object you want to evaluate, and release. The nearest detection is matched automatically and the IoU pops out. Draw as many as you like. Click Clear to reset.

Tip: you can also drag corners of existing boxes to resize.
Green = your GT. Orange = best-matching detection. Purple shade = intersection.
GT boxes drawn0
Mean IoU (best-match)
Last IoU
The 0.5 threshold is a convention, not a law. COCO averages AP over IoU thresholds from 0.5 to 0.95 in steps of 0.05 precisely because a single threshold rewards lazy slightly-off boxes as much as pixel-perfect ones. Slide the GT box to tighter and looser versions of the same object and see how IoU changes.
Step 3

Confidence thresholding

The detector emits dozens of low-confidence guesses alongside its good ones. In production, you pick a score threshold; only detections above the bar survive. Slide the threshold and watch boxes appear and vanish.

ground-truth
above threshold
below threshold
Live threshold sweep on the current detector output.

Precision, Recall, F1 vs your ground truth

If you've drawn any GT boxes in Step 2, we match them to the surviving detections by IoU ≥ 0.5 (classical COCO TP rule). Unmatched GTs are false negatives; unmatched detections are false positives.

True positives0
False positives0
False negatives0
Precision
Recall
F1
Draw a few GT boxes first. With no ground truth, precision and recall are undefined. Draw 2–3 GT boxes around real objects, then slide the confidence threshold. Watch the F1 curve peak near the model's "natural" operating point—typically around 0.3–0.5 for SSD.
Step 4

Non-Max Suppression (NMS)

Detectors fire on the same object many times. NMS is the classical deduplication rule:

  1. Sort detections by confidence, highest first.
  2. Keep the top detection.
  3. Discard every other detection whose IoU with the kept one exceeds the NMS threshold.
  4. Repeat on the remaining detections.

COCO-SSD already applies its own NMS internally. To see NMS in action, we load the detector with maxNumBoxes: 40 and an internal IoU threshold of 1.0—meaning nothing is suppressed at the model level. You then apply NMS yourself below.

Faint = suppressed. Solid = kept.
Pre-NMS
Kept
Suppressed

NMS trace

stepactionclassscoreagainst
Waiting…
Step 5

Detector families

The SSD running on this page is one point in a bigger design space. The three dominant families:

FamilyHow it proposes boxesRepresentativesGotcha
Anchor-based (this page) Dense grid of prior boxes at many scales / aspect ratios; each predicts class + offsets. SSD, Faster R-CNN, RetinaNet, YOLOv3/v4/v5 Needs NMS. Anchor hyperparameters are sensitive.
Anchor-free / point-based Per pixel (or cell): predict "am I a centre?" + box extents. FCOS, CenterNet, YOLOX, YOLOv8 Assignment loss is trickier; still typically uses NMS.
Set-based (DETR style) Fixed set of learned queries; Hungarian matching against GT. DETR, Deformable DETR, DINO NMS-free. Trains slowly; originally weak on small objects.
Step 6

Precision–Recall curve and AP

A single confidence threshold picks one point on a precision-recall curve. Benchmarks want the whole curve. If you've drawn GT boxes, we sweep the threshold from 1 down to 0, plot precision against recall, and integrate to get Average Precision (AP).

Click "Draw GT" in Step 2 to populate this curve.

Average Precision on this photo

Area under the PR curve computed from your drawn GT boxes. Usually improves when you draw GT around visually clear objects and shrinks on crowded or occluded scenes.

Step 7

Four things that still get people

Myth

"A 99% confidence means the model is right."
Modern detectors are badly calibrated. Upload a photo with a borderline case and you'll see SSD spike to 0.9 on plausibly wrong boxes. Use the PR curve, not the raw number.

Myth

"NMS is just cleanup."
On crowded scenes, classic NMS deletes real duplicate objects. Try a photo with two cats sitting close and watch one box get killed. That's Soft-NMS / DETR territory.

Myth

"mAP summarises everything."
Maybe you want 100% recall at any precision (medical scan), or 100% precision at any recall (autopilot). Always plot the full PR curve before picking a model.

Myth

"A bigger box is a safer box."
A prediction that strictly contains the truth has IoU = truth/pred. Huge "safety" boxes tank IoU. Tight wins.

Final takeaway. You just ran a real COCO detector on real images, measured its output against ground truth you drew by hand, and watched precision, recall, NMS, and AP all behave as the math predicts. The rest of modern detection research is about replacing SSD with better backbones (ViT, ConvNeXt), better assignment (set-based, anchor-free), and better losses—but every piece ultimately lands on this same scoreboard.