Skip to main content

Lesson 5 · 11 min

Gradient descent — how models actually learn

Pick a loss. Compute its gradient. Step downhill. Repeat. That's every neural network ever trained.

The whole picture in one paragraph

A model has parameters θ (millions of them). For any input, it produces an output. We measure how wrong the output is with a loss function L(θ). The gradient ∇L tells us, for each parameter, which direction increases the loss most. We move in the opposite direction (downhill). One step: θ ← θ − η · ∇L(θ). The learning rate η controls step size.