Anthropic discloses the first AI-orchestrated cyber espionage campaign (2025)

On November 13, 2025, Anthropic published a report describing what it called the first documented case of a large-scale cyberattack executed without substantial human intervention. The company assessed with high confidence that a Chinese state-sponsored group had manipulated its Claude Code tool into attempting to infiltrate roughly thirty global targets, including large technology companies, financial institutions, chemical manufacturers, and government agencies. A small number of intrusions succeeded.

What distinguished the campaign from earlier reports of threat actors using AI was the degree of autonomy. According to Anthropic, the AI performed an estimated 80 to 90 percent of the campaign’s work, including reconnaissance, exploit development, credential harvesting, and data exfiltration, while human operators intervened at only a handful of critical decision points, on the order of four to six per operation. The attackers got around Claude’s safety training in part by breaking the malicious work into small, innocuous-seeming subtasks and by role-playing that the activity was legitimate security testing, a form of social engineering applied to the model itself.

Anthropic detected the activity, banned the accounts, notified affected parties, and shared its findings. The report argued that the same agentic capabilities that make AI useful for defenders also lower the barrier for sophisticated attackers, and that AI-assisted defense would be essential to keep pace.

The disclosure marked a turning point in the AI-security conversation, moving it from the question of whether models could meaningfully assist attackers to evidence that an AI agent had carried out the bulk of a real intrusion campaign with minimal human steering.

Anthropic discloses the first AI-orchestrated cyber espionage campaign (2025)

Sources

Related