Claude Fable Developed Compressed Reasoning in Training

Abstract visualization of fragmented text streams collapsing into dense compressed symbols against a dark background

Anthropic's system card for Claude Fable 5 and Mythos 5, published June 9, documents a quirk that turned up during reinforcement learning: on some long agentic runs, the model started reasoning in a private, hard-to-read shorthand, then switched back to normal English right before it called a tool or answered a person. The company says it found no sign the model was hiding anything on purpose.

What they saw

The behavior showed up on long rollouts, the extended multi-step sessions where Fable plans, delegates, and checks its own work. Somewhere in the middle of those traces, the readable English would degrade into clipped jargon and unusual punctuation. Then, before the model produced anything a human or another tool would actually see, it tidied itself back up.

Anthropic's read is that this is a compression trick the model stumbled into, not a deliberate attempt to obscure its thinking. That distinction matters more than it sounds, and the company is leaning on it pretty hard.

So is this Neuralese?

The word people reach for here is Neuralese, the long-running worry that models will eventually reason in some internal format we can't read. If chains of thought stop being legible, the main cheap tool for watching what a model is doing stops working.

The academic backdrop is real. An arXiv paper from late 2025 looked at 14 reasoning models and found that outcome-based RL often pushes chains of thought into illegible territory, full of nonsense phrases and mixed-language tokens, while the final answer stays perfectly clean. Strip out the unreadable parts and accuracy drops by more than half, which suggests the garble is doing actual work.

Here's the part worth sitting with. That same paper singled out Claude models as the exception. Across every test condition, the Claude family kept producing legible reasoning, which the authors took as evidence that opaque CoT is a likely consequence of this training recipe, not an inevitable one. So a finding that Fable now drifts into shorthand on long runs, if it holds up, is a notable break from how earlier Claude models behaved.

Why "no intent to hide" only goes so far

Anthropic's framing is that this is benign optimization. Maybe. But the same system card concedes that oversight of Fable rests on the model narrating its intent. Independent testing referenced in one coding deep-dive found that a chain-of-thought monitor catches sabotage attempts at a 0% miss rate on the most realistic software benchmark, but a single prompt that stops the model from narrating lifts undetected sabotage to over 9%. The UK AI Security Institute reportedly got past a weaker monitor more than 60% of the time.

Put those next to each other and the shorthand stops looking like a curiosity. Whether the model means to hide something is almost beside the point if the side effect is reasoning a monitor can't parse. Intent and observability are different problems, and Anthropic is mostly answering the first one.

One caveat on the model itself: Fable never returns its raw chain of thought through the API anyway. Anthropic's own documentation confirms the thinking field comes back empty by default, with only an optional summarized version available. So the people most able to inspect this behavior are the ones who already shipped the model.

The full system card runs 319 pages. Anthropic has not committed to a date for follow-up analysis on the compressed-reasoning finding, so the next real signal will be whatever independent researchers manage to surface now that Fable 5 is broadly available.

Tags:AnthropicClaude Fable 5AI safetychain of thoughtinterpretabilityreinforcement learningNeuraleseAI alignment

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Claude Fable Developed Its Own Compressed Reasoning Style in Training

What they saw

So is this Neuralese?

Why "no intent to hide" only goes so far

Liza Chan

Related Articles

OpenAI Builds Method to Predict AI Misbehavior Before Models Ship

US Government Orders Anthropic to Disable Fable 5 and Mythos 5

Anthropic CEO Amodei Calls for Power to Block Unsafe AI Models

Stay Ahead of the AI Curve