Google DeepMind Frontier Safety Framework

The Frontier Safety Framework (FSF) is Google DeepMind’s set of protocols for anticipating and managing severe risks from its most capable AI models. DeepMind first published the framework in May 2024 and released a substantially updated version 2.0 in February 2025, drawing on input from industry, academic, and government experts.

The framework is organized around Critical Capability Levels (CCLs): specific thresholds of capability - in areas like cyber-offense, biosecurity, and AI self-improvement - at which a model could enable serious harm. As a model approaches a CCL, the framework prescribes mitigations matched to the level of risk. These fall into two groups: security protocols, which use tiered levels of protection to keep model weights from being stolen, with the strongest protections reserved for capabilities that could accelerate AI research itself; and deployment mitigations, a process of safeguards, safety-case documentation, and governance review required before a model with critical capabilities is released.

Version 2.0 also added an explicit focus on deceptive alignment - the risk that an autonomous system might work to undermine human oversight - which DeepMind described as an industry-leading element of its approach. The FSF sits alongside Anthropic’s Responsible Scaling Policy and OpenAI’s Preparedness Framework as one of the major labs’ “if-then” commitments that tie deployment decisions to measured capability.

For businesses and policymakers, the FSF is part of an emerging norm: rather than waiting for harm, leading labs now publish capability thresholds in advance and pre-commit to specific safeguards once those thresholds are crossed.

Google DeepMind Frontier Safety Framework

Sources

Related