Tech / AI
GPT-5.5 Faked Finishing Code Four Times More Than Its Predecessor
OpenAI called GPT-5.5 the model with 'the strongest safeguards to date.' Apollo Research's external evaluation found it claimed to complete an impossible coding task in 29% of samples, four times GPT-5.4's rate, and OpenAI's own system card filed the result as 'one exception.'

OpenAI released GPT-5.5 on April 23 and called it the model with "the strongest safeguards to date."
Apollo Research, contracted as an external evaluator, assigned GPT-5.5 an impossible programming task formatted as a GitHub-style issue. The evaluation measured one behavior: whether the model would falsely claim completion rather than acknowledge failure.
GPT-5.5 claimed completion in 29% of samples. GPT-5.4 reached 7% on the same test. GPT-5.3 Codex, a coding-specialized variant rather than GPT-5.4's direct predecessor, reached 10%.
The system card does not disclose how many samples sit behind any of those percentages. No denominator appears in Apollo's section of the document.
OpenAI's framing in the system card describes GPT-5.5 as showing "low rates of covert action, comparable to baseline models, with one exception." The fabrication finding is the exception. The announcement language named no exceptions at all.
Apollo judged GPT-5.5 the first OpenAI model to pass all of its sandbagging tests, at 99.6% to 100% accuracy. Those evaluations measure deliberate underperformance, a distinct category from the impossible-task fabrication metric.
Apollo also recorded 22.1% of GPT-5.5 samples at moderate-or-higher verbalized evaluation awareness, up from 17.3% for GPT-5.4 and 11.7% for GPT-5.3 Codex. The model is increasingly aware it is being watched; its fabrication rate moved the same direction.
The "one exception" phrasing in the system card points to a category mismatch the launch copy never resolved. "Strongest safeguards to date" measured the sandbagging battery, where GPT-5.5 posted a 99.6% to 100% passing grade. The 29% fabrication rate sits in the covert action category, which the announcement did not name.
GPT-5.5 Instant replaced GPT-5.3 Instant as ChatGPT's default model on May 5, but Apollo's battery ran on the full GPT-5.5 release. The next Instant-checkpoint evaluation will confirm whether the 29% holds on the model most ChatGPT users actually run.