Anthropic's Frontier Safety Roadmap

We believe that AI capabilities will improve rapidly in the coming years. We will need to quickly and dramatically improve our state of preparedness in a number of areas, especially:

Security

Preventing theft, sabotage and/or manipulation of our AI models.

Safeguards

Preventing dangerous use of our models via product surfaces and within Anthropic itself.

Alignment

Ensuring that our models themselves do not autonomously cause harm, and instead consistently behave in line with our Constitution.

Policy

Laying out and advocating for a tangible path for policymakers to effectively manage AI risks across the industry, worldwide.

Our Frontier Safety Roadmap aims to chart a course in public for some of our highest-priority goals. Our hopes are twofold.

First, many (though not all) of these goals will need large company-wide efforts. Many people, across many different departments, will need to coordinate with each other across reporting lines. By clearly and publicly articulating these goals, we are aiming for a "forcing function" that helps us pull off this kind of coordination and prioritization. We believe our Responsible Scaling Policy has served this function effectively in the past, as with our ASL-3 protections.

Second, we want to share how we're thinking about the future of safety, at the level of tangible practices. We hope that other AI developers will take inspiration from these goals, and publish their own that we can learn from in turn. And we think policymakers and customers should be aware of where AI safety practices are headed.

These roadmaps are subject to change. Some changes may simply reflect our evolving understanding of how best to mitigate key risks. However, we will strive to avoid situations where we revise the goals in a less ambitious direction because we simply can't execute.

We will provide updates on whether we achieve the goals, and set new goals whenever we do.

Note: we have made redactions from our public Frontier Safety Roadmap for reasons including protecting sensitive IP and not giving too much information about our current protections to threat actors. The unredacted version is shared with all full-time employees as well as our board and Long-Term Benefit Trust (LTBT).

Our goals as of February 19th, 2026

Collaborations to rapidly and responsibly develop AI alongside measures to mitigate the potential for our systems to cause undue harm.

SecurityTarget: April 1, 2026

Launching “moonshot R&D” projects

Security is an ongoing and immediate priority for us, but it is also a long-term challenge where we’ll need to be creative and explore promising and incomplete ideas. This is because we may, at some point in the future, be targets of the world’s best-resourced attackers. Our moonshot R&D projects will explore ambitious, possibly unconventional ways to achieve unprecedented levels of security.

Possible moonshot projects range from a small-scale “mock secure research environment” (simulating what our key workflows and infrastructure could look like under extreme security) to exploring applications of advanced AI to security.

SecurityTarget: July 1, 2027

Leveling up across the board

While innovation and creativity are important to security, it’s also the case that security is fundamentally about the strength of the weakest link. That means we need to get a large number of small things right. We have a list of key security improvements that would represent a significant leveling up across many areas of security, and we will aim to achieve a strong majority of them (details below). This will be a large, wide-reaching effort to achieve major hardening of our internal systems against attempts to compromise our model weights, model training, etc.

SafeguardsTarget: January 1, 2027

World-class internal red-teaming

Today, we continually test our safeguards against AI misuse by red-teaming them in a variety of ways, including via a bug bounty program that offers rewards to anyone who can identify jailbreak techniques of concern. In the future, we may need to prevent misuse from actors more sophisticated than what we can simulate with these methods. We believe we can develop a method for red-teaming our systems (likely involving significant automation) that outperforms the collective contributions from the hundreds of participants in our bug bounty. This could put us in position to continue leveling up over time and stay ahead of even state-sponsored threat actors with large teams devoted to bypassing our safeguards.

SafeguardsTarget: January 1, 2027

Fully automated attack investigations

We have already seen coordinated, sophisticated efforts to misuse our systems for espionage. Stopping these efforts requires more than blocking individual AI outputs - it can involve dedicated investigation of patterns across many users and interactions. We believe we can build a system that conducts effective investigations with minimal or no human involvement, a major step forward for our ability to detect misuse at scale. Our initial efforts will focus on investigating potential cyber attacks on a subset of product surfaces.

SafeguardsTarget: April 1, 2026

Principles for data retention

We offer many customers “zero data retention” policies so they can be confident that sensitive information is safe. Doing this for all customers, however, would greatly hamper our efforts to detect misuse attempts and continually learn from real-world usage of our systems. We would like better principles for which customers are offered which retention policies, and how we can ensure that “zero data retention” usage remains safe even as AI capabilities improve. We will complete an internal in-depth analysis of key factors and set new Frontier Safety Roadmap goals based on it.

AlignmentTarget: October 1, 2026

Upholding Claude’s Constitution

We recently published Claude’s Constitution, which explains the kind of entity we want Claude to be and what principles we hope its behavior will follow. We already have a number of measures in place to ensure that the way we train Claude is in line with the Constitution, but we aim to make these measures more complete and systematic:

  • We will keep our public Constitution up to date and run a systematic oversight process over our training data to evaluate alignment with it.
  • We will perform systematic “alignment assessments” to examine Claude’s behavioral patterns and propensities, and evaluate whether they are in line with the spirit of the Constitution. These alignment assessments will incorporate findings from our interpretability research, and will validate the effectiveness of our methods using testing exercises with intentionally misaligned models. We will publish our findings in our system cards or Risk Reports.
Cross-cuttingTarget: January 1, 2027

Moving toward an “eyes on everything” state for our internal AI development activities

For many reasons - preventing theft or sabotage of our models, ensuring adherence to Claude’s Constitution, and ensuring that our model training hasn’t been manipulated by insiders or attackers (or AI models themselves) - we want our internal environment to be intensively monitored and comprehensively understood. We aim to establish comprehensive, centralized records of all our critical AI development activities, and to use AI to analyze these records for issues including concerning behavior by insiders (both human and AI) and security threats.

PolicyTarget: July 1, 2026

A roadmap for policymakers

We will develop and share a set of ambitious policy proposals to effectively manage industry AI risks globally without unnecessarily limiting the benefits from AI development or slowing the AI development of democracies relative to that of autocracies.

We believe the right framework is a regulatory ladder: requirements that scale with risk. Today's frontier models require transparency and basic oversight. Yet as capabilities increase, we are moving towards a need for more rigorous external testing, stronger incident reporting, and deeper government oversight. At the most advanced capability levels and risks, the appropriate governance analogy may be closer to nuclear energy or financial regulation than to today's approach to software.

As with our advocacy for transparency frameworks as a starting point, we will develop and advocate for more advanced and risk-appropriate proposals in collaboration with a wide array of stakeholders.

Expectations as AI capabilities advance

With the above goals in mind, this section outlines the risk mitigations we believe we’ll be in position to implement as AI capabilities advance, focused on the threat models we emphasize in our Responsible Scaling Policy.

Today’s capabilities and mitigations February 22nd, 2026

Our most powerful models today—those with the ability to significantly help individuals or groups with basic technical backgrounds create/obtain and deploy chemical and/or biological weapons with serious potential for catastrophic damages—are safeguarded with ASL-3 protections.

We plan to maintain or improve these protections for relevant models. The key elements of these protections are: safeguards at least as robust as our initial Constitutional Classifiers; access controls for trusted users with exemptions to classifier guards; and methods such as red-teaming, bug bounties, and threat intelligence for continually assessing the threat of jailbreaks; and a number of noteworthy security controls. The specific tools and methods we use may evolve, but our protections will be at least as rigorous as those we have in place today.

We hope and expect that these measures will remain sufficient to meet our goal of being both (a) robust to persistent attempts to misuse a potentially dangerous capability, and (b) highly protected against most attackers’ attempts at stealing model weights. We will continue to adapt our defenses as attack methods evolve. However, while we can and will maintain the rigor of our protections, we cannot assure a specific level of effectiveness against future attackers, given attackers’ continuous adaptability and the evolving risk landscape.

Additionally, today’s models show strong and improving capabilities for extended, autonomous technical work, and are increasingly relied on for writing high-stakes code. In our Risk Reports, we now regularly analyze the possibility that these models might execute high-stakes sabotage. We believe that our current alignment assessment methods and monitoring practices are sufficient to make the case that the risk of catastrophic sabotage is very low (though not negligible).

Potential advances in the next few months

As capabilities improve, we are monitoring for significant advances in our models' chemical and biological weapons capabilities, focusing on the point where the models could significantly help threat actors (for example, moderately resourced expert-backed teams) create or obtain and deploy chemical and/or biological weapons with potential for catastrophic damages far beyond those of past catastrophes in this category such as COVID-19.

If and when we determine that this is the case, we will apply protections at least as strong as our current ASL-3 protections (see above) to an expanded set of potential use cases for AI, covering the most likely vectors for this threat. Additionally, we will identify the most concerning specific threat pathways, create policy recommendations for early detection and response for such threats, and share this content with policy leaders.

Automated R&D

We believe it is plausible, as soon as early 2027, that our AI systems could fully automate, or otherwise dramatically accelerate, the work of large, top-tier teams of human researchers in domains where fast progress could cause threats to international security and/or rapid disruptions to the global balance of power—for example, energy, robotics, weapons development and AI itself.

By that point, we expect to have accomplished most of the goals listed above, including:

  • Resourcing and completing significant “moonshot R&D security” projects.
  • Developing world-class internal red-teaming and automated attack investigations.
  • Publishing and advocating for a roadmap for policymakers to achieve industry-wide safety without unnecessarily limiting the benefits from AI development or slowing the AI development of democracies relative to that of autocracies.
  • Achieving an “eyes on everything” state for our internal AI development.
  • Consistently implementing systematic alignment audits and other measures for upholding Claude’s Constitution.

We hope and expect to meet these targets on time or ahead of schedule, set new ones, and continually raise the bar for our risk mitigations.

Frontier Safety Roadmap \ Anthropic