Google Research dropped a potentially game-changing architecture this week. Titans introduces "test-time memorization," letting AI models update their core memory while actively processing data, not just during training. The result: models that handle over 2 million tokens of context while running faster than traditional transformers.
The breakthrough tackles a fundamental limitation. Standard transformers get exponentially slower as context length grows, making full-document analysis or genomic sequencing computationally brutal. Previous fixes like Mamba compressed context into fixed-size states, but lost critical information in the process. Titans splits the difference by using a deep neural network as its memory module, one that actively learns which information matters through a "surprise metric." Low surprise? Skip it. High surprise? Commit to permanent storage.
The numbers back up the theory. On the BABILong benchmark, which tests reasoning across extremely long documents, Titans outperformed GPT-4 despite running with far fewer parameters. Google also released MIRAS, the theoretical framework underpinning the approach, along with three attention-free model variants (YAAD, MONETA, MEMORA) that push beyond standard error measurements for more robust memory.
The Bottom Line: Google's Titans could reshape how AI handles massive documents, codebases, and genomic data by making real-time learning during inference practical at scale.
QUICK FACTS
- Context window: Scales beyond 2 million tokens
- Beats GPT-4 on BABILong benchmark with fewer parameters
- Authors: Ali Behrouz, Meisam Razaviyayn, Vahab Mirrokni (Google Research)
- Published: December 4, 2025
- Papers: Titans and MIRAS (both available on arXiv)




