Anthropic's Responsible Scaling Policy

On September 19, 2023, Anthropic published its Responsible Scaling Policy (RSP), a framework of technical and organizational commitments designed to manage catastrophic risks from increasingly capable AI models. The policy was one of the first attempts by a frontier AI lab to bind its own deployment decisions to a published, graduated risk standard, and it was approved by Anthropic’s board.

The core of the policy is a system of AI Safety Levels (ASL), explicitly modeled on the biosafety levels used to handle dangerous biological materials. ASL-1 covers systems with no meaningful catastrophic risk, such as older or narrow models. ASL-2 covers present-day large language models, including the Claude models of the time, which show early signs of dangerous capabilities but do not yet meaningfully increase risk beyond what is available without them. ASL-3 covers models that substantially increase the risk of catastrophic misuse or show low-level autonomous capability, and triggers stronger security and deployment safeguards. ASL-4 and higher are reserved for future, more capable systems whose criteria were left to be defined later.

The central commitment is conditional: Anthropic stated it would not train or deploy a model at a given safety level until it had implemented the safeguards required for that level. In effect, the policy commits the company to pause scaling if its safety measures fail to keep pace with the capabilities of its models. Anthropic framed this as analogous to pre-market safety testing in industries such as aviation and pharmaceuticals.

The RSP influenced industry practice. Other labs adopted comparable “preparedness” or “frontier safety” frameworks, and the approach of pairing capability thresholds with predefined safeguards fed into the voluntary frontier-safety commitments that AI companies made at international summits in 2024.

Anthropic's Responsible Scaling Policy

Sources

Related