Vanishing Gradient Playground

Watch gradients decay—or survive—through deep networks

Network Depth

10 layers

Activation Function

Gradient at input layer:

Effectively zero — this layer cannot learn

Input layer (deepest) Output layer

Gradient > 10⁻³

10⁻⁶ – 10⁻³

Gradient < 10⁻⁶

Key Insight