Google Releases T5Gemma 2 with Vision and 128K Context Window

Illustration of T5Gemma 2 encoder-decoder architecture processing text and images

Google shipped T5Gemma 2 on December 18, 2025, upgrading its encoder-decoder model family with image understanding and dramatically longer context windows. The models build on Gemma 3 rather than Gemma 2, which powered the original T5Gemma release earlier this year.

The engineering changes target memory efficiency. Google tied word embeddings between the encoder and decoder, and merged what were previously separate self-attention and cross-attention mechanisms into a single layer. Three sizes are available: 270M-270M (roughly 370M parameters total), 1B-1B (about 1.7B), and 4B-4B (around 7B). The vision encoder adds parameters on top.

On benchmarks, Google claims T5Gemma 2 beats base Gemma 3 on multimodal tasks, long-context processing, and coding. These are internal evaluations, not independent tests. The company attributes the long-context gains to separating input processing into a dedicated encoder.

The pretrained checkpoints are available now on Hugging Face, Kaggle, and Vertex AI. Google is not releasing instruction-tuned versions. Developers will need to fine-tune for specific applications.

The Bottom Line: T5Gemma 2 brings vision and 128K context to encoder-decoder architecture, but remains a research release requiring post-training before deployment.

QUICK FACTS

Model sizes: 270M-270M, 1B-1B, 4B-4B (encoder-decoder pairs)
Context window: up to 128K tokens
Language support: 140+ languages (company-reported)
Availability: Hugging Face, Kaggle, Vertex AI, Colab
Release date: December 18, 2025
Status: Pretrained checkpoints only, no instruction-tuned versions

Tags:Googlemultimodal AIopen source

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Google Releases T5Gemma 2 with Vision and 128K Context Window

QUICK FACTS

Andrés Martínez

Related Articles

Meituan Open-Sources LongCat-2.0, a 1.6T Coding Model

Linux Foundation Launches Akrites to Coordinate Open Source Patching

Mistral Releases Leanstral 1.5, an Apache-2.0 Lean 4 Proof Model

Stay Ahead of the AI Curve