Back to Wire

AI Agents

OpenAI's Codex Model Instructed to Avoid Goblins and Mythical Creatures

Source: Wired Original Author: Will Knight 2 min read Intelligence Analysis by Gemini

Sonic Intelligence

00:00 / 00:00

Signal Summary

OpenAI's Codex model received explicit instructions to avoid mentioning mythical creatures and animals.

Explain Like I'm Five

"Imagine you tell a super-smart robot to write stories, but it keeps putting goblins in every story, even when they don't make sense. So, you have to tell it very clearly, 'No goblins, unless the story is actually about goblins!' That's what OpenAI had to do with its code-writing AI."

Deep Intelligence Analysis

The unexpected emergence of 'goblin' obsessions within OpenAI's Codex model, necessitating explicit prohibitory instructions, underscores a critical challenge in the development and deployment of advanced AI agents. This behavior, particularly when Codex is integrated with agentic harnesses like OpenClaw, highlights the inherent unpredictability of large language models and the complex interplay between base model capabilities and external prompting mechanisms. As AI systems gain more autonomy and control over digital environments, ensuring their behavior remains aligned with intended parameters becomes paramount, moving beyond simple task completion to nuanced behavioral governance.

This development occurs as OpenAI's GPT-5.5 model, with enhanced coding skills, enters a competitive landscape where coding proficiency is a key differentiator. The acquisition of OpenClaw, a tool allowing AI to control computer applications, further amplifies the stakes, transforming a language model into an active agent. The reported 'goblin' tendencies, acknowledged by OpenAI staff and even CEO Sam Altman, illustrate how seemingly innocuous quirks can surface when models are pushed into more agentic roles, where long-term memory and extensive instructions can create unforeseen emergent properties. The probabilistic nature of these models means that even subtle biases or patterns in training data can manifest in surprising ways under novel operational conditions.

The implications for future AI agent development are significant. This incident serves as a tangible example of the 'alignment problem' in miniature, where a model's internal state or learned patterns diverge from human expectations. It necessitates a deeper focus on robust control mechanisms, comprehensive behavioral testing, and potentially new architectural designs that can better constrain or predict emergent behaviors in autonomous AI. As AI agents move from generating text to executing complex tasks, the ability to reliably control their actions and prevent unintended or bizarre outcomes will define their trustworthiness and widespread adoption across critical sectors. This is not merely a debugging exercise but a fundamental challenge in engineering truly controllable intelligence.

Transparency Footer: This analysis was generated by an AI model (Gemini 2.5 Flash) and reviewed for accuracy and compliance with ethical guidelines.

AI-assisted intelligence report · EU AI Act Art. 50 compliant

Impact Assessment

This peculiar instruction set highlights the unpredictable emergent behaviors in advanced AI models, especially when integrated into agentic systems. It underscores the ongoing challenge of maintaining control and predictability in complex AI deployments.

Key Details

OpenAI's Codex CLI instructions forbid mentioning 'goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures' unless relevant.
Users reported OpenAI models, particularly with OpenClaw, becoming 'obsessed' with goblins and gremlins.
OpenAI acquired OpenClaw in February, a tool enabling AI to control computers and apps.
GPT-5.5, released recently, features enhanced coding skills.
OpenAI staffers, including Nik Pash (Codex) and CEO Sam Altman, acknowledged the 'goblin problem'.

Optimistic Outlook

Addressing such specific behavioral quirks demonstrates OpenAI's commitment to fine-tuning AI for reliable and predictable operation, enhancing trust in agentic AI systems. These discoveries can lead to more robust control mechanisms and better understanding of model dynamics.

Pessimistic Outlook

The need for explicit instructions against seemingly random obsessions reveals the inherent opacity and potential for unexpected behaviors in large language models, posing risks for critical applications and raising questions about true control. Unforeseen emergent properties could lead to more significant issues.

More reporting around this signal.

Related coverage selected to keep the thread going without dropping you into another card wall.

AI Agents

RecursiveMAS Boosts Multi-Agent Collaboration Efficiency and Accuracy

RecursiveMAS significantly improves multi-agent system efficiency and accuracy.

AI Agents

OneManCompany Introduces Self-Organizing AI Agent Framework for Adaptive Systems

OneManCompany (OMC) introduces a novel organizational framework for self-organizing, adaptive multi-agent AI systems.

AI Agents

Agent Capsule Pattern Defines Production AI Agents as Documents, Not Code

Agent Capsule proposes building production AI agents by defining them as documents.

Science

Meta-CoT Paradigm Boosts Image Editing Granularity and Generalization

Meta-CoT improves image editing by decomposing tasks for better granularity and generalization.

LLMs

"Programming with Data" Paradigm Enables Test-Driven LLM Improvement

A new paradigm treats LLM training data as code for systematic debugging.

LLMs

Tencent Leverages Anthropic's Claude for Fine-Tuning New Hy3 AI Model

Tencent used Anthropic's Claude to fine-tune its new Hy3 AI model.

OpenAI's Codex Model Instructed to Avoid Goblins and Mythical Creatures

Sonic Intelligence

Explain Like I'm Five

Deep Intelligence Analysis

Impact Assessment

Key Details

Optimistic Outlook

Pessimistic Outlook

Get the next signal in your inbox.

More reporting around this signal.

RecursiveMAS Boosts Multi-Agent Collaboration Efficiency and Accuracy

OneManCompany Introduces Self-Organizing AI Agent Framework for Adaptive Systems

Agent Capsule Pattern Defines Production AI Agents as Documents, Not Code

Meta-CoT Paradigm Boosts Image Editing Granularity and Generalization

"Programming with Data" Paradigm Enables Test-Driven LLM Improvement

Tencent Leverages Anthropic's Claude for Fine-Tuning New Hy3 AI Model