Vanishing Gradient Playground

Watch gradients decay—or survive—through deep networks

10 layers
Gradient at input layer:
Effectively zero — this layer cannot learn
Input layer (deepest) Output layer
Gradient > 10−3
10−6 – 10−3
Gradient < 10−6
Key Insight