Goodfire Maps Curved Geometry Inside Neural Networks

Goodfire researchers published a research post Thursday arguing that neural networks don't store concepts as straight directional vectors at all. They store them as curved manifolds: loops, surfaces, and twisted strings of points in activation space that mirror the structure of whatever the model has learned about the world.

The piece, billed as the opening entry in what the lab calls its "neural geometry" series, is authored by seven researchers including Atticus Geiger and Ekdeep Singh Lubana. It leans on a small mountain of prior work, much of it from outside the lab.

Days of the week, on a circle

The example that's been making the rounds online: days of the week. Inside a language model, Monday through Sunday don't sit in a line. They sit on a loop. So if you try to nudge the model from "Monday" to "Friday" by drawing a straight vector through activation space, the points in between are nonsense. Move along the circle instead, rotating the angle rather than mixing coordinates linearly, and you pass through Tuesday, Wednesday, Thursday in order.

This isn't a Goodfire discovery. The circular loop result for days and months traces back to Engels et al. in 2024, and the broader catalog of curved geometry inside networks now spans dozens of papers the post cites. Numbers, years, colors, the spatial layout of objects in vision models, the entire tree of life inside a genomic foundation model. All of it lives on manifolds.

Brain surgery on a toy car

To make the abstract argument concrete, the team trains a small world model on the classic mountain car reinforcement learning setup. The model learns to predict the next frame given the car's state and an action. The position of the car ends up encoded as a string-like one-dimensional curve in the encoder's activation space.

Fit a smooth manifold to that string, intervene along it, and you can slide the car up and down the hill cleanly. Try the standard trick instead, taking a linear steering vector between two positions, and the predictions go to pieces. Some linear paths cross "voids" where the model's output becomes garbled. Others accidentally hit a different valid activation, and the car teleports to that location. The team's word, not mine.

The SAE problem

Then there's the awkward part for the rest of the interpretability field. Sparse autoencoders, the dominant tool for breaking models into interpretable features, don't see the manifolds. They shatter them.

The post walks through a manifold of slant rhymes ending in "-ore," with perfect rhymes like "door" at one end and weaker ones like "wire" at the other. The SAE features that reconstruct this manifold get auto-labeled with descriptions like "words beginning with Hor," "tokens starting with Por," "names containing the syllable tor." Each label catches a tiny local patch. None of them notice that the whole thing is about how the words sound at the end. The semantics live in the shape, not the pieces.

Goodfire isn't calling for SAEs to be retired and explicitly says they remain useful. But the implication is hard to miss: if your interpretability method routinely chops up the structure that carries the meaning, that's a problem the field hasn't fully reckoned with. Notable, given that sparse autoencoders have anchored a lot of mech-interp work over the past few years, including at Anthropic, which happens to be Goodfire's first outside investor.

How big a deal is this

The strong version of the claim, repeated several times in the post, is that neural geometry is a "crucial frontier" in understanding and controlling models, and possibly the route to cracking the black box. That's a lot of weight for a research direction that, by the authors' own citations, has been actively studied for years across multiple labs.

What's actually new here is the consolidation: a clear thesis, a concrete demonstration with the mountain car, and a sharper critique of how SAE-based interpretability misses the bigger picture. Whether manifold-based methods scale to a frontier model with hundreds of billions of parameters is the question this post doesn't answer.

The next entry in the series, on manifold steering, is already up. Goodfire says further posts on unsupervised manifold discovery and on reconstructing manifolds from SAE features are coming.

Goodfire Says Neural Networks Encode Concepts on Curved Manifolds

Days of the week, on a circle

Brain surgery on a toy car

The SAE problem

How big a deal is this

Oliver Senti

Related Articles

Anthropic-Redwood Paper Shows Training Can Defeat AI Model Sandbagging

Anthropic's Jack Clark Puts 60% Odds on Self-Training AI by 2028

Anthropic's Natural Language Autoencoders Turn Claude's Activations Into Readable Text

Stay Ahead of the AI Curve