MIT Researchers Show 59 Scientific AI Models Are Converging on a Shared Understanding of Matter

Nearly sixty machine learning models spanning chemistry, materials science, and protein research are developing aligned internal representations of physical reality, according to a study from MIT's Department of Materials Science and Engineering. The work, led by researchers Sathya Edamadaka and Soojung Yang in Rafael Gómez-Bombarelli's Learning Matter Lab, was selected for an oral spotlight presentation at the NeurIPS 2025 UniReps Workshop.

The question nobody was asking

Here's what makes this strange: the models tested couldn't be more different from each other. Some read molecular structures as text strings. Others work with 2D graphs. Still others process full 3D atomic coordinates. They were trained on different datasets, built on different architectures, and optimized for different tasks.

And yet, when the researchers extracted the internal representations from these models and compared them, they found significant alignment. Models that had never seen the same training example, that processed inputs in completely different formats, were learning to organize chemical information in similar ways.

The finding echoes earlier work on language and vision models. In 2024, a team including Minyoung Huh and Phillip Isola proposed what they called the Platonic Representation Hypothesis, arguing that as AI models get better at their tasks, they converge toward a shared statistical model of reality. The MIT study is the first systematic test of whether this phenomenon extends to scientific domains.

How they measured it

The researchers used a metric called Centered Kernel Nearest-Neighbor Alignment, or CKNNA, borrowed from the language and vision convergence work. The idea is straightforward: if two models consider the same pairs of molecules to be similar, their representations are aligned.

They tested 59 models across five datasets covering small molecules, large molecules, inorganic materials, and proteins. The model zoo included everything from simple string encoders to equivariant graph neural networks to protein structure predictors.

Cross-modality alignment between string-based models and 3D atomistic models exceeded the highest values found in the original language-vision study, which is unexpected. Protein models showed even stronger convergence between sequence and structure representations, with alignment scores roughly double those seen for small molecules.

The researchers also tracked what happens as models improve at their training tasks. For machine learning interatomic potentials predicting material energies, better performance correlated with closer alignment to the best-performing models. Models seem to be converging on something, and that something correlates with accuracy.

The limits show up quickly

But there's a catch. The convergence pattern held for data similar to what models had seen during training. Push models outside their comfort zone, and things fell apart.

When the researchers fed large organic molecules from the OMol25 dataset to models trained primarily on inorganic materials from OMat24, the story changed. Almost all models collapsed onto low-information representations, learning similar but essentially useless encodings. Architecture dominated where training data once ruled.

This is the sobering part of the findings. Current scientific foundation models remain fundamentally data-limited. On familiar territory, they develop rich, aligned representations. On unfamiliar structures, they default to shallow fallback modes that miss the chemical detail needed for accurate predictions.

The researchers identified two distinct failure regimes. For in-distribution inputs, weak models scatter into idiosyncratic local optima, each finding its own way to be accurate without learning generalizable structure. For out-of-distribution inputs, nearly all models, weak or strong, converge on the same inadequate representation.

What counts as foundational

The work proposes a practical test for whether a model has achieved foundation status: it should be both high-performing and well-aligned with other high-performing models. A model that achieves strong results through memorization or dataset-specific tricks will show weak alignment with genuinely generalizable models.

By this measure, Meta's UMA model emerged as particularly foundational for materials, while Orb V3 Conservative showed strong alignment with other top performers on small molecules. The MACE-OFF model presented an interesting counterexample. It achieved excellent results on the QM9 benchmark but showed weak alignment with other high performers, suggesting its representations may not transfer well beyond that specific chemical space.

What it means for model development

The researchers argue that representational alignment could guide architectural choices. They point to Orb V3 Conservative, which achieves strong alignment with fully equivariant architectures despite not building rotational equivariance into its structure. Instead, it uses a lightweight regularization scheme during training that apparently produces similar effects at lower computational cost.

Model size mattered less than expected. Similar representations emerged from models with vastly different parameter counts, suggesting that smaller models might be distilled from larger ones without losing essential representational structure.

The paper also found that training data trumps architecture for determining representation structure on in-distribution inputs. Models trained on the same dataset clustered together regardless of their design, while models with similar architectures but different training data diverged.

The implication for building genuinely universal scientific models is clear: more diverse training data, spanning both equilibrium and non-equilibrium regimes across broader chemical space. Current datasets don't impose a strong enough statistical signal to unify representations across the field.

The code for extracting embeddings and computing alignment metrics will be released through the Learning Matter Lab's GitHub repository.