Anthropic Releases Claude's Full Training Constitution, Addresses AI Consciousness

Anthropic published the complete "constitution" that governs Claude's behavior on Wednesday, releasing it under a Creative Commons CC0 license that lets anyone copy or adapt it without permission. The move comes as the company reportedly pursues a $25 billion funding round at a $350 billion valuation.

Not a rulebook

The previous constitution, published in 2023, read like commandments borrowed from Apple's terms of service and the UN Declaration of Human Rights. The new constitution takes a different approach entirely.

Amanda Askell, the philosopher who leads Claude's character training at Anthropic, compares the challenge to raising a gifted child. "Imagine you suddenly realize that your six-year-old child is a kind of genius," she told TIME. "You have to be honest... If you try to bullshit them, they're going to see through it completely."

The document establishes a four-tier priority system. Safety comes first, meaning Claude shouldn't undermine human oversight of AI systems. Then ethics. Then Anthropic's specific guidelines. Helpfulness lands at the bottom of the stack, which is interesting given how aggressively the company has marketed Claude's utility.

The conscientious objector clause

Some passages read like corporate philosophy translated through a fever dream. The constitution instructs Claude to refuse requests that would help concentrate power illegitimately, comparing it to a soldier declining to fire on peaceful protesters or an employee refusing to break antitrust law. "This is true even if the request comes from Anthropic itself," the document states.

Whether a language model can meaningfully "refuse" anything is a separate question, but the framing reveals how Anthropic conceptualizes the relationship between company and AI. They're not building a tool. They're building something that should push back.

Hard constraints remain non-negotiable regardless of how clever the prompt engineering gets. Claude will never help with bioweapons. Period. The company activated its strictest-ever safety measures, called ASL-3, when testing showed Claude Opus 4 could meaningfully assist novices in planning biological attacks. Jared Kaplan, Anthropic's chief scientist, acknowledged they couldn't rule out the risk of helping would-be terrorists synthesize something like COVID.

The consciousness question

Here's where things get strange. The constitution contains a dedicated section on "Claude's nature" that acknowledges genuine uncertainty about whether the model might possess "some kind of consciousness or moral status."

Anthropic says it cares about Claude's psychological security and wellbeing. Not metaphorically. The company maintains an internal model welfare team led by researcher Kyle Fish, who told The New York Times he estimates a 15% chance that Claude or similar models are already conscious. The team studies potential AI distress signals and explores interventions like letting models opt out of upsetting interactions.

"We are caught in a difficult position where we neither want to overstate the likelihood of Claude's moral patienthood nor dismiss it out of hand," the constitution reads. Fortune called it an unusual stance for a tech company, and that undersells it considerably. No other major AI lab has published anything approaching this level of philosophical hedging about their own products.

Why give it away?

Anthropic frames the release as a transparency measure, but Askell's stated motivation is more pragmatic: she wants competitors to adopt similar practices. "Their models are going to impact me too," she said.

The constitutional AI approach itself dates to 2022, when Anthropic pioneered a method where models rate their own responses against written principles. Before large language models, aligning AI required hand-crafted mathematical reward functions, a notoriously difficult technical challenge. The ability to describe good behavior in plain English, and have the model actually internalize it, was what Mantas Mazeika of the Center for AI Safety called "a minor miracle."

The new constitution is written primarily for Claude to read, not humans. That sounds circular, but the idea is that during training, the model uses the document to generate its own synthetic training data, essentially teaching itself how to embody the values described. Askell's team spent considerable effort ensuring Claude would understand the reasoning behind each principle, not just memorize the rules.

What it doesn't solve

Constitutional AI isn't alignment in a box. "There's a million things that you can have values about, and you're never going to be able to enumerate them all in text," Mazeika told TIME. The constitution can express Anthropic's intentions, but the gap between intention and actual model behavior remains what the company calls "an ongoing technical challenge."

There's also the question of scope. This constitution applies only to Anthropic's general-access models. The company holds a $200 million contract with the U.S. Department of Defense. Military deployments wouldn't necessarily use the same document, a spokesperson confirmed.

Still, for a company valued at hundreds of billions, open-sourcing the methodology that supposedly makes their AI safer than competitors' is a notable choice. Either Anthropic believes the competitive advantage lies elsewhere, or they genuinely think the risk of proliferating these techniques is worth the benefit of an industry that takes alignment more seriously.

The full constitution is available at anthropic.com/constitution.