DeepMind Open-Sources Gemma Scope 2 for LLM Interpretability

Abstract visualization of neural network interpretability layers with glowing feature connections

Google DeepMind released Gemma Scope 2 on December 19, an open-source interpretability toolkit that lets researchers peer inside the Gemma 3 family of language models. The release spans all model sizes from 270M to 27B parameters. DeepMind calls it the largest open-source release of interpretability tools from an AI lab to date, though that's their own assessment.

The toolkit combines sparse autoencoders (SAEs) and transcoders trained on every layer of the Gemma 3 models. Production required storing around 110 petabytes of data and training over 1 trillion total parameters. Skip-transcoders and cross-layer transcoders now enable tracking information flow across multiple layers, not just isolated snapshots. The original Gemma Scope covered only Gemma 2 at the 2B and 9B sizes.

New features target chatbot behavior specifically. Researchers can now study refusal mechanisms, chain-of-thought faithfulness, and jailbreak attempts in instruction-tuned models. DeepMind also implemented Matryoshka training, a technique meant to help SAEs detect more useful concepts and fix flaws from the first release.

Weights are available on Hugging Face. Neuronpedia hosts an interactive demo for visualizing feature activations without downloading the full files. Artifacts will continue rolling out through December 31, 2025.

The Bottom Line: DeepMind's toolkit gives safety researchers direct access to trace potential hallucinations and jailbreaks across an entire model family, from 270M parameters up to 27B.

QUICK FACTS

Model coverage: Gemma 3 at 270M, 1B, 4B, 12B, and 27B parameters
Training scale: 1 trillion+ parameters, 110 petabytes of stored data
Release date: December 19, 2025
Previous version: Gemma Scope covered Gemma 2 (2B and 9B) with 400+ SAEs
Availability: Hugging Face (weights), Neuronpedia (interactive demo)

Tags:Google DeepMindAI interpretabilityLLM safetyopen source

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

DeepMind Open-Sources Gemma Scope 2 for LLM Interpretability

QUICK FACTS

Andrés Martínez

Related Articles

Meituan Open-Sources LongCat-2.0, a 1.6T Coding Model

Linux Foundation Launches Akrites to Coordinate Open Source Patching

Mistral Releases Leanstral 1.5, an Apache-2.0 Lean 4 Proof Model

Stay Ahead of the AI Curve