Tencent's Hunyuan team open-sourced Hy-MT1.5-1.8B-1.25bit on April 29, a translation model crushed down to 440MB so it can run fully offline on Android phones. The original 1.8B-parameter base model takes up 3.3GB at FP16. Quantization shrinks it by roughly 87%.
The compression uses Sherry, a 1.25-bit ternary scheme accepted at ACL 2026 that packs four weight values into five bits using a 3:4 sparsity pattern. The model card lists 33 languages plus five dialects and minority languages, covering 1,056 translation directions.
Tencent says the quantized version surpasses Google Translate on benchmark evaluations, per its own technical report. Independent results aren't out yet. The team also pitches the 1.8B model against Tower-Plus-72B and Qwen3-32B, both far larger open-source models, on internal scoring.
Weights and GGUF builds are on Hugging Face. An Android demo APK ships alongside the model card. No iOS version yet, and no timeline for one.
The release follows the December launch of the base 1.8B and 7B HY-MT1.5 models. A roomier 2-bit variant at 574MB is also out for users who want a bit more headroom.
Bottom Line
Sherry's 1.25-bit ternary quantization shrinks Tencent's 1.8B translation model from 3.3GB to 440MB.
Quick Facts
- Model size: 440MB after 1.25-bit quantization
- Base model: 1.8B parameters, 3.3GB at FP16
- Languages: 33, plus 5 dialects and minority languages
- Translation directions: 1,056
- Release date: April 29, 2026 (company-reported benchmarks)




