AI Research

Goodfire Says Neural Networks Encode Concepts on Curved Manifolds

New Goodfire research argues days, months, and colors live on curved manifolds inside AI models, not straight lines.

Oliver Senti
Oliver SentiSenior AI Editor
May 9, 20264 min read
Share:
Abstract visualization of curved geometric manifolds and circular loops representing concepts inside a neural network's activation space

Goodfire researchers published a research post Thursday arguing that neural networks don't store concepts as straight directional vectors at all. They store them as curved manifolds: loops, surfaces, and twisted strings of points in activation space that mirror the structure of whatever the model has learned about the world.

The piece, billed as the opening entry in what the lab calls its "neural geometry" series, is authored by seven researchers including Atticus Geiger and Ekdeep Singh Lubana. It leans on a small mountain of prior work, much of it from outside the lab.

Days of the week, on a circle

The example that's been making the rounds online: days of the week. Inside a language model, Monday through Sunday don't sit in a line. They sit on a loop. So if you try to nudge the model from "Monday" to "Friday" by drawing a straight vector through activation space, the points in between are nonsense. Move along the circle instead, rotating the angle rather than mixing coordinates linearly, and you pass through Tuesday, Wednesday, Thursday in order.

This isn't a Goodfire discovery. The circular loop result for days and months traces back to Engels et al. in 2024, and the broader catalog of curved geometry inside networks now spans dozens of papers the post cites. Numbers, years, colors, the spatial layout of objects in vision models, the entire tree of life inside a genomic foundation model. All of it lives on manifolds.

Brain surgery on a toy car

To make the abstract argument concrete, the team trains a small world model on the classic mountain car reinforcement learning setup. The model learns to predict the next frame given the car's state and an action. The position of the car ends up encoded as a string-like one-dimensional curve in the encoder's activation space.

Fit a smooth manifold to that string, intervene along it, and you can slide the car up and down the hill cleanly. Try the standard trick instead, taking a linear steering vector between two positions, and the predictions go to pieces. Some linear paths cross "voids" where the model's output becomes garbled. Others accidentally hit a different valid activation, and the car teleports to that location. The team's word, not mine.

The SAE problem

Then there's the awkward part for the rest of the interpretability field. Sparse autoencoders, the dominant tool for breaking models into interpretable features, don't see the manifolds. They shatter them.

The post walks through a manifold of slant rhymes ending in "-ore," with perfect rhymes like "door" at one end and weaker ones like "wire" at the other. The SAE features that reconstruct this manifold get auto-labeled with descriptions like "words beginning with Hor," "tokens starting with Por," "names containing the syllable tor." Each label catches a tiny local patch. None of them notice that the whole thing is about how the words sound at the end. The semantics live in the shape, not the pieces.

Goodfire isn't calling for SAEs to be retired and explicitly says they remain useful. But the implication is hard to miss: if your interpretability method routinely chops up the structure that carries the meaning, that's a problem the field hasn't fully reckoned with. Notable, given that sparse autoencoders have anchored a lot of mech-interp work over the past few years, including at Anthropic, which happens to be Goodfire's first outside investor.

How big a deal is this

The strong version of the claim, repeated several times in the post, is that neural geometry is a "crucial frontier" in understanding and controlling models, and possibly the route to cracking the black box. That's a lot of weight for a research direction that, by the authors' own citations, has been actively studied for years across multiple labs.

What's actually new here is the consolidation: a clear thesis, a concrete demonstration with the mountain car, and a sharper critique of how SAE-based interpretability misses the bigger picture. Whether manifold-based methods scale to a frontier model with hundreds of billions of parameters is the question this post doesn't answer.

The next entry in the series, on manifold steering, is already up. Goodfire says further posts on unsupervised manifold discovery and on reconstructing manifolds from SAE features are coming.

Tags:neural geometryGoodfireAI interpretabilitymechanistic interpretabilityneural networksmanifoldssparse autoencodersAI research
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Goodfire Maps Curved Geometry Inside Neural Networks | aiHola