Lesson 4 · 9 min

Probability for ML, briefly

You don't need to be a probabilist. You need: distributions, expectation, log-probs, entropy.

The four ideas

Distribution — for each possible outcome, a probability. They sum to 1. The model's output is a distribution over the next token.
Expectation — a weighted average. E[X] = Σ p(x) · x. The expected reward, the expected loss, the expected next token are all expectations.
Log-probabilities — probabilities are tiny (1e-9 is normal). Computers go numerically unstable. So we work in log space: log(p). Multiplications become additions.
Entropy — how 'spread out' a distribution is. H(p) = -Σ p(x) log(p(x)). High entropy = uniform = uncertain. Low entropy = concentrated = confident.