Skip to main content

Lesson 2 · 11 min

Self-instruct and instruction-tuning data generation

The self-instruct technique uses an LLM to generate (instruction, response) pairs from a small seed set. This is how most instruction-following datasets are built at scale.

The self-instruct loop

Self-instruct (Wang et al., 2022) is the core pattern behind datasets like Alpaca (52k examples generated by GPT-3) and FLAN. The idea:

  1. Start with ~175 seed instructions written by humans
  2. Use the LLM to generate new instructions that are diverse and novel (filter duplicates aggressively)
  3. For each new instruction, generate a response
  4. Filter low-quality pairs (too short, refusals, nonsensical)
  5. Loop until you have enough data

The key insight: the LLM is better at generating completions for instructions than at generating novel instructions. So you use the seed set to ground instruction diversity, then let the model expand.