In the InstructGPT paper, “Training language models to follow instructions with human feedback,” the authors report that human labelers preferred outputs from the 1.3-billion-parameter InstructGPT model over outputs from the 175-billion-parameter GPT-3, even though InstructGPT had roughly 100 times fewer parameters. The result showed that aligning a model with human feedback could matter more than raw size.