Lesson 2 · 11 min
Self-instruct and instruction-tuning data generation
The self-instruct technique uses an LLM to generate (instruction, response) pairs from a small seed set. This is how most instruction-following datasets are built at scale.
The self-instruct loop
Self-instruct (Wang et al., 2022) is the core pattern behind datasets like Alpaca (52k examples generated by GPT-3) and FLAN. The idea:
- Start with ~175 seed instructions written by humans
- Use the LLM to generate new instructions that are diverse and novel (filter duplicates aggressively)
- For each new instruction, generate a response
- Filter low-quality pairs (too short, refusals, nonsensical)
- Loop until you have enough data
The key insight: the LLM is better at generating completions for instructions than at generating novel instructions. So you use the seed set to ground instruction diversity, then let the model expand.