Google dropped TranslateGemma on Wednesday, a set of open translation models built on Gemma 3. Available in 4B, 12B, and 27B parameter sizes, the models support 55 languages and are designed to run on everything from phones to cloud GPUs. The timing is notable: the release came hours after OpenAI unveiled its own ChatGPT Translate tool.
The headline number is efficiency. According to Google's developer blog, the 12B model outperforms the Gemma 3 27B baseline when measured using MetricX on the WMT24++ benchmark. That's better translation quality at less than half the parameters. The 4B model, meanwhile, matches the 12B baseline performance, making it viable for mobile inference. Google's benchmarks are self-reported, so independent testing will determine whether these gains hold up.
The models inherit Gemma 3's multimodal capabilities. Tests on the Vistra image translation benchmark showed improvements in translating text within images without specific fine-tuning for that task. Google trained the models on 55 language pairs with full evaluation, plus an additional 500 pairs without confirmed metrics yet. The technical report details the two-stage training: supervised fine-tuning on synthetic and human-translated data, followed by reinforcement learning using MetricX-QE and AutoMQM reward models.
TranslateGemma is available now on Hugging Face, Kaggle, and Vertex AI. The 4B targets mobile, the 12B fits consumer laptops, and the 27B needs a single H100 or TPU in the cloud.
The Bottom Line: Google's 12B model reportedly matches or beats larger translation baselines, but those benchmarks are company-reported and await independent verification.
QUICK FACTS
- 55 languages fully evaluated on WMT24++ benchmark
- 12B model scores 3.60 on MetricX vs. 4.04 for 27B baseline (lower is better, company-reported)
- 4.3 billion tokens used in supervised fine-tuning
- 500 additional language pairs trained but not yet benchmarked
- Available on Hugging Face, Kaggle, and Vertex AI




