Alibaba's Tongyi Lab pushed out Fun-ASR 1.5 on Monday, expanding its speech recognition model to 30 languages and adding mid-stream language switching. The rollout is live on Alibaba Cloud's Model Studio and the ModelScope community.
The headline feature: one model handles code-switching without toggling modes. Russian, English, German, and Japanese sit alongside Mandarin, seven Chinese dialect groups, and more than 20 regional accents. Output now includes automatic punctuation and formatted dates, numbers, and currencies, removing a step that transcription pipelines usually bolt on afterward.
Tongyi reports a 56.2% drop in character error rate on dialect scenarios versus the previous version, with five dialects clearing 90% accuracy. Classical Chinese poetry hits 97% character-level accuracy. The numbers come from internal tests; no independent benchmarks yet.
Pricing runs pay-as-you-go. An hour of audio costs $0.32 outside China and $0.16 on the mainland, per the launch announcement. Free developer quotas cover testing in the Singapore region.
Bottom Line
Fun-ASR 1.5 now handles 30 languages in a single model, with Alibaba-reported dialect error rates down 56.2% from the previous version.
Quick Facts
- Launched April 20, 2026 by Alibaba's Tongyi Lab
- 30 supported languages including Russian, English, German, Japanese
- 7 Chinese dialect groups plus 20+ regional accents
- 56.2% character error rate reduction on dialects (company-reported)
- Pricing: $0.32 per hour outside China, $0.16 in mainland China




