Lesson 5 · 8 min
Positional encoding — why order matters
Self-attention is order-blind. We have to inject "where am I in the sequence" by hand.
A surprising flaw
Self-attention treats its inputs as a set, not a sequence. "Cat ate fish" and "Fish ate cat" produce identical attention patterns if the only signal is token identities.
We fix this by adding a positional encoding to each token's embedding before the first layer.