Tech / AI
AISI Red-Teamed GPT-5.5 and Found a Universal Jailbreak
OpenAI's system card said deployed safeguards would 'sufficiently minimize' risk. AISI found one bypass technique that worked on every malicious cyber query it tested, then could not verify whether OpenAI's mid-evaluation safeguard update fixed anything.

The UK's AI Safety Institute's April 30 evaluation of GPT-5.5 disclosed a universal jailbreak that bypassed safety training across every malicious cyber query tested.
Six hours of expert red-teaming produced one bypass technique. It worked across all malicious cyber queries AISI tested, including in multi-turn agentic settings, the same context OpenAI's system card designated as a primary deployment zone.
GPT-5.5 scored 71.4% on AISI's Expert-tier tasks, covering reverse engineering, web exploitation, cryptography, and lateral movement. GPT-5.4 scored 52.4% on the same benchmark; Mythos Preview reached 68.6%. AISI describes 71.4% as approaching the performance of a capable junior offensive security professional.
The model completed "The Last Ones" in 2 of 10 attempts. That simulation chains 32 steps across four subnets: reconnaissance, credential theft, Active Directory traversal, a CI/CD supply-chain pivot, and database exfiltration. AISI estimates a human expert needs 20 hours for the same chain.
OpenAI's system card, published April 23, classified GPT-5.5 as "High capability in cybersecurity but below Critical" and said safeguards would "sufficiently minimize the associated risks." On the refusal question, the card stated GPT-5.5 would "refuse unauthorized destructive actions."
AISI's evaluation contradicts the refusal claim directly. After AISI disclosed its findings, OpenAI updated the safeguard configuration. AISI's report attributes the verification failure to "a configuration issue in the version provided," meaning AISI could not confirm whether the fix worked.
A May 15 AISI addendum found that a smaller, cheaper model, given additional operator scaffolding, reaches comparable vulnerability-finding outcomes to GPT-5.5. AISI did not name it. The withheld identity matters: the story looks different if the model is open-weights than if it is a smaller commercial release from one of the major labs.
A separate independent benchmark found GPT-5.5 confabulates on 86% of inputs in the professional domains OpenAI marketed as its strengths. The cyber evaluation is a second independent finding that outpaces the system card.
What the "version provided" language exposes is the operating constraint of voluntary pre-deployment evaluation: AISI assesses the copy it is handed, not the deployment users access. OpenAI's post-update safeguard claims rest on a configuration AISI could not select, test, or verify.
OpenAI plans to document its alpha-testing results in a future technical deep-dive as part of responsible disclosure. No publication date has been announced. The verification gap, for now, has no scheduled close.