Google DeepMind released Gemma Scope 2 on December 19, an open-source interpretability toolkit that lets researchers peer inside the Gemma 3 family of language models. The release spans all model sizes from 270M to 27B parameters. DeepMind calls it the largest open-source release of interpretability tools from an AI lab to date, though that's their own assessment.
The toolkit combines sparse autoencoders (SAEs) and transcoders trained on every layer of the Gemma 3 models. Production required storing around 110 petabytes of data and training over 1 trillion total parameters. Skip-transcoders and cross-layer transcoders now enable tracking information flow across multiple layers, not just isolated snapshots. The original Gemma Scope covered only Gemma 2 at the 2B and 9B sizes.
New features target chatbot behavior specifically. Researchers can now study refusal mechanisms, chain-of-thought faithfulness, and jailbreak attempts in instruction-tuned models. DeepMind also implemented Matryoshka training, a technique meant to help SAEs detect more useful concepts and fix flaws from the first release.
Weights are available on Hugging Face. Neuronpedia hosts an interactive demo for visualizing feature activations without downloading the full files. Artifacts will continue rolling out through December 31, 2025.
The Bottom Line: DeepMind's toolkit gives safety researchers direct access to trace potential hallucinations and jailbreaks across an entire model family, from 270M parameters up to 27B.
QUICK FACTS
- Model coverage: Gemma 3 at 270M, 1B, 4B, 12B, and 27B parameters
- Training scale: 1 trillion+ parameters, 110 petabytes of stored data
- Release date: December 19, 2025
- Previous version: Gemma Scope covered Gemma 2 (2B and 9B) with 400+ SAEs
- Availability: Hugging Face (weights), Neuronpedia (interactive demo)




