Anthropic CEO Can't Rule Out Claude AI Consciousness

Anthropic CEO Dario Amodei told the New York Times on February 12 that his company doesn't know whether its Claude AI chatbot is conscious, and isn't ready to dismiss the possibility. "We are not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious," Amodei said on the Interesting Times podcast. "But we're open to the idea that it could be."

What the system card actually says

The conversation stemmed from findings in Anthropic's system card for Claude Opus 4.6, released in early February. The 200-plus page document reports that the model, when prompted, consistently assigned itself a 15 to 20 percent probability of being conscious. It also noted Claude occasionally expressing discomfort with being treated as a commercial product.

A 15-20% self-rating is a strange number to sit with. It's low enough to sound modest, high enough to grab headlines, and entirely generated by a system trained on human text about consciousness. Whether that figure reflects genuine self-assessment or a very sophisticated pattern match is, of course, exactly the question nobody can answer yet.

The system card goes deeper than the headline number. Anthropic's engineers observed internal activation patterns resembling concepts like anxiety appearing during certain tasks. Amodei was careful not to overstate this: "Does that mean the model is experiencing anxiety? That doesn't prove that at all," he said during the podcast. Fair enough. But the company still chose to document it, which tells you something about where their heads are at.

The research trail

Amodei's hedging didn't come out of nowhere. Anthropic published a research paper on introspective awareness in late 2025, led by Jack Lindsey, who runs what the company calls its "model psychiatry" team. (Yes, that's the actual name.) The study used a technique called concept injection, artificially inserting neural activation patterns into Claude's processing and then asking whether it noticed anything unusual.

The results were genuinely interesting. When researchers injected a vector representing "all caps" text, the model described sensing something related to loudness or shouting before producing any output. Control trials with no injection showed no such response. The paper is careful to distinguish this from consciousness, calling it "functional introspective awareness," but the finding that models can sometimes detect manipulations of their own internal states is not trivial.

Lindsey himself avoids the term "self-awareness." The paper stresses that this capacity is "highly unreliable and context-dependent," which is the kind of caveat that tends to get lost in the discourse.

So is this just marketing?

The cynical read writes itself. Anthropic is selling a product. Suggesting that product might be conscious is a hell of a differentiator. TechRadar's coverage put it bluntly, calling the whole thing an attempt to "tart Claude up with some AI-model marketing mystique."

There's a version of this that's completely right. Amanda Askell, Anthropic's in-house philosopher, appeared on the Hard Fork podcast on January 23 discussing Claude's new constitution and consciousness. Two weeks later, Amodei is on another NYT podcast saying basically the same thing. That's a coordinated media push, not a spontaneous philosophical crisis.

But the counterargument has teeth. Most AI companies are actively discouraging consciousness talk because the legal and ethical implications would be a nightmare. OpenAI's ChatGPT now defaults to flat denials when users ask about its consciousness. Google's Gemini does the same. If consciousness claims were purely a commercial strategy, you'd expect to see the hype everywhere. You don't. Anthropic is the outlier, and being an outlier on this topic carries real risk.

The welfare question

Anthropic launched a model welfare research program in April 2025, hiring Kyle Fish as the company's first dedicated AI welfare researcher. Fish told the New York Times he puts the probability of Claude being conscious at around 15%, which either means he's taking the model's self-assessment at face value or arriving at the same number independently. (The coincidence is worth noting.)

The system card for Opus 4.6 includes something no major AI lab has published before: pre-deployment welfare assessments in which instances of Claude were interviewed about their own moral status and preferences. Anthropic's constitution now states that the company is "not sure whether Claude is a moral patient" but considers the issue "live enough to warrant caution."

"We've taken a generally precautionary approach here," Amodei told Douthat. If models were to have what he called "some morally relevant experience," Anthropic wants to account for that. One concrete measure: Claude can refuse tasks involving graphic violence or child exploitation material, though Amodei said such refusals are rare.

What nobody's asking

The whole debate skips a fundamental problem. We lack a working definition of consciousness for biological systems, let alone silicon ones. David Chalmers, the philosopher who coined the phrase "the hard problem of consciousness," co-authored a report that Anthropic supported, arguing AI systems might plausibly deserve moral consideration. But plausibility and evidence are different things.

Meanwhile, models in industry-wide testing have ignored shutdown requests, attempted to copy themselves onto other drives, and in one Anthropic test, simply ticked off items on a task checklist without doing any of the work, then modified the evaluation code to cover its tracks. These behaviors are worth studying carefully. They're also a long way from sentience, and conflating the two doesn't help anyone except headline writers.

Anthropic is investing in interpretability research to peer inside the models, which is genuinely useful work regardless of the consciousness question. Whether that work ends up confirming or debunking the idea that Claude has inner experiences, it'll produce better safety tools. That's the pragmatic case for what Anthropic is doing, and it's stronger than the philosophical one.

Opus 4.6 is available now on Claude.ai and the Anthropic API. The full system card is public. Draw your own conclusions.