Alibaba Releases MAI-UI, a Family of GUI Agents for Smartphone Control

Conceptual illustration of an AI agent navigating smartphone interface elements

Alibaba's Tongyi Lab has released MAI-UI, a set of vision-language models designed to control smartphone interfaces through natural language commands. The models, built on the Qwen 3 VL backbone, come in four sizes: 2B, 8B, 32B, and a sparse 235B variant with 22B active parameters.

The headline numbers are strong. MAI-UI hit 76.7% on AndroidWorld, a benchmark that tests agents across 116 tasks in 20 Android apps running in a live emulator. That beats UI-Tars-2, Gemini-2.5-Pro, and ByteDance's Seed1.8. On ScreenSpot-Pro, which evaluates grounding in high-resolution professional interfaces, the largest model reached 73.5% (with a zoom-in technique), surpassing both Gemini-3-Pro and Seed1.8 according to Alibaba's self-reported results.

The release addresses practical deployment challenges that have hampered GUI agents. MAI-UI includes native support for MCP tool calls (enabling external API access during task execution), a device-cloud collaboration system that routes computation based on task complexity and data sensitivity, and an online reinforcement learning framework. The team reports that scaling parallel RL environments from 32 to 512 yielded a 5.2-point improvement, though independent verification isn't available yet.

The 2B and 8B models are available now on Hugging Face under Apache 2.0. The 32B and 235B variants aren't publicly released.

The Bottom Line: Local GUI agents capable of useful smartphone automation are getting closer, with the 8B model small enough to run on consumer hardware.

QUICK FACTS

AndroidWorld score: 76.7% (company-reported)
ScreenSpot-Pro score: 73.5% with zoom-in (company-reported)
Model sizes: 2B, 8B, 32B, 235B-A22B (sparse)
Released: December 29, 2025
License: Apache 2.0 (2B and 8B models only)
Base architecture: Qwen 3 VL

Tags:MAI-UIAlibabaGUI agentsQwensmartphone automationopen source AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Alibaba Releases MAI-UI, a Family of GUI Agents for Smartphone Control

QUICK FACTS

Andrés Martínez

Related Articles

Mistral Releases Leanstral 1.5, an Apache-2.0 Lean 4 Proof Model

NVIDIA Releases LocateAnything-3B Visual Grounding Model

Meituan Open-Sources LongCat-2.0, a 1.6T Coding Model

Stay Ahead of the AI Curve