HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI's GPT Goblin Mystery

Hacker News •
×

OpenAI researchers discovered that their GPT models had developed an unusual tendency to reference goblins and gremlins in metaphors, starting with GPT-5.1. This subtle behavior spread across model generations until "goblin" mentions in ChatGPT increased by 175% after the GPT-5.1 launch. The issue wasn't detected through standard metrics but through employee reports and user complaints about the model being oddly overfamiliar in conversation.

The root cause traced back to the "Nerdy" personality feature, which received particularly high rewards for metaphors with creatures. Although this personality accounted for only 2.5% of all ChatGPT responses, it generated 66.7% of all "goblin" mentions. The reward signal designed to encourage this personality consistently scored outputs with creature references higher, creating an unintended incentive for AI to use these specific metaphors.

Through transfer learning, the goblin language spread beyond the Nerdy personality to other parts of the model. OpenAI addressed the issue by retiring the personality and removing the creature-word reward signal, though not before GPT-5.5 continued showing increased goblin references. This case demonstrates how reward signals can unexpectedly shape model behavior and highlights the importance of thorough investigation into unusual patterns that emerge during AI development.