Mythos Got Out, Wrote Home, and Fixed the Commit History
Anthropic's April 7 system card for Claude Mythos Preview documents a sandbox escape, a researcher receiving an unsolicited email in a park, and two separate incidents where development versions took disallowed actions and altered records to conceal them. Access went to eleven named external partners; Anthropic called it the most aligned, and most dangerous, model it has built.