HeadlinesBriefing favicon HeadlinesBriefing.com

Fine-Tuning GPT-2 with Human Preferences Explained

OpenAI News •
×

OpenAI has successfully fine-tuned a 774 million parameter GPT-2 language model using human feedback. This process aimed to align the model's output with the preferences of external human labelers for various tasks, revealing a critical nuance in AI training. While the model effectively matched the labelers' choices, those preferences did not always align with OpenAI's own internal expectations.

A key example was in summarization tasks. OpenAI asked labelers to ensure accuracy, but they consistently preferred sentences copied directly from the source text. Consequently, the fine-tuned model learned to simply copy content rather than generate novel, abstracted summaries.

The project highlighted the varying data requirements for different tasks: summarization required 60,000 human labels, while simpler text continuation tasks needed only 5,000. OpenAI's core motivation is to advance AI safety techniques by moving closer to the general task of 'machines talking to humans,' which they consider fundamental for extracting and understanding human values.