Lesson 4 · 9 min
Probability for ML, briefly
You don't need to be a probabilist. You need: distributions, expectation, log-probs, entropy.
The four ideas
- Distribution — for each possible outcome, a probability. They sum to 1. The model's output is a distribution over the next token.
- Expectation — a weighted average.
E[X] = Σ p(x) · x. The expected reward, the expected loss, the expected next token are all expectations. - Log-probabilities — probabilities are tiny (1e-9 is normal). Computers go numerically unstable. So we work in log space:
log(p). Multiplications become additions. - Entropy — how 'spread out' a distribution is.
H(p) = -Σ p(x) log(p(x)). High entropy = uniform = uncertain. Low entropy = concentrated = confident.