10
layers
Gradient at input layer:
Effectively zero — this layer cannot learn
Input layer (deepest)
Output layer
Gradient > 10−3
10−6 – 10−3
Gradient < 10−6
Key Insight