Tech / AI
OpenAI's Nerdy Persona Spread Creature Words Across Four Models
OpenAI patched GPT-5.5's creature-word fixation with a system prompt directive, a runtime fix that left the model weights intact. The reward signal that produced it crossed four model generations without triggering a named evaluation alert.

OpenAI added a goblin prohibition to the Codex system prompt on April 23, patching a training artifact that had spread across four model generations since GPT-5.1.
The source was a personality variant called Nerdy, designed to be "an unapologetically nerdy, playful and wise AI mentor," where trainers rewarded creature metaphors with high ratings.
After GPT-5.1 launched in November 2025, goblin references in ChatGPT conversation data rose 175% and gremlin mentions rose 52%. GPT-5.2 became the baseline for the Nerdy persona's concentrated output.
By GPT-5.4, under the Nerdy variant, goblin usage had climbed 3,881.4% from its GPT-5.2 level, according to Knightli's analysis of the primary disclosure. Under the default persona in the same GPT-5.2-to-5.4 window, goblin usage fell 3.2%. The Nerdy variant covered 2.5% of all ChatGPT conversations but generated 66.7% of every goblin mention.
OpenAI traced the root cause to the Nerdy reward signal and retired that personality in mid-March 2026. Roughly six weeks later, @arbs8021, an unidentified Codex user, spotted the goblin directive in OpenAI's public Codex CLI repository and posted on April 28. OpenAI had added the system prompt patch to Codex five days earlier, on April 23; no source names an exact internal-detection date.
OpenAI published "Where the goblins came from" on April 30. GPT-5.1 registered the first spike; GPT-5.2 through GPT-5.4 compounded it under the Nerdy persona. GPT-5.5 began training before the root cause was isolated, so its weights carry the contamination; the system prompt manages the output.
Four Generations, No Alert
OpenAI's post does not name a specific evaluation suite that should have flagged the drift. Capability benchmarks measure whether a model solves problems correctly; vocabulary habit falls outside their scope. A 3,881.4% rise in goblin references under a single persona registers nothing on a task-completion or coding leaderboard.
OpenAI built new behavioral audit tooling after the public disclosure.
The goblin case changes the math on how labs certify persona-variant training. The Gemini underperformance case showed models can resist an RL training signal; here a reward signal escaped its intended scope and survived four generations of weights. Vocabulary drift in style and rhetoric leaves no trace in a reasoning or capability eval.
WaveSpeed Blog reported on May 14 a Codex routing entry briefly referencing "gpt-5.6" before it disappeared. That version is the first model trained after OpenAI built its behavioral audit tooling. Watch whether GPT-5.6's system prompt still carries the goblin clause.