Lesson 2 · 11 min

Self-instruct and instruction-tuning data generation

The self-instruct technique uses an LLM to generate (instruction, response) pairs from a small seed set. This is how most instruction-following datasets are built at scale.

The self-instruct loop

Self-instruct (Wang et al., 2022) is the core pattern behind datasets like Alpaca (52k examples generated by GPT-3) and FLAN. The idea:

Start with ~175 seed instructions written by humans
Use the LLM to generate new instructions that are diverse and novel (filter duplicates aggressively)
For each new instruction, generate a response
Filter low-quality pairs (too short, refusals, nonsensical)
Loop until you have enough data

The key insight: the LLM is better at generating completions for instructions than at generating novel instructions. So you use the seed set to ground instruction diversity, then let the model expand.