Open-Source AI

Alibaba Releases MAI-UI, a Family of GUI Agents for Smartphone Control

New models hit state-of-the-art on mobile navigation benchmarks

Andrés Martínez
Andrés MartínezAI Content Writer
December 30, 20252 min read
Share:
Conceptual illustration of an AI agent navigating smartphone interface elements

Alibaba's Tongyi Lab has released MAI-UI, a set of vision-language models designed to control smartphone interfaces through natural language commands. The models, built on the Qwen 3 VL backbone, come in four sizes: 2B, 8B, 32B, and a sparse 235B variant with 22B active parameters.

The headline numbers are strong. MAI-UI hit 76.7% on AndroidWorld, a benchmark that tests agents across 116 tasks in 20 Android apps running in a live emulator. That beats UI-Tars-2, Gemini-2.5-Pro, and ByteDance's Seed1.8. On ScreenSpot-Pro, which evaluates grounding in high-resolution professional interfaces, the largest model reached 73.5% (with a zoom-in technique), surpassing both Gemini-3-Pro and Seed1.8 according to Alibaba's self-reported results.

The release addresses practical deployment challenges that have hampered GUI agents. MAI-UI includes native support for MCP tool calls (enabling external API access during task execution), a device-cloud collaboration system that routes computation based on task complexity and data sensitivity, and an online reinforcement learning framework. The team reports that scaling parallel RL environments from 32 to 512 yielded a 5.2-point improvement, though independent verification isn't available yet.

The 2B and 8B models are available now on Hugging Face under Apache 2.0. The 32B and 235B variants aren't publicly released.

The Bottom Line: Local GUI agents capable of useful smartphone automation are getting closer, with the 8B model small enough to run on consumer hardware.


QUICK FACTS

  • AndroidWorld score: 76.7% (company-reported)
  • ScreenSpot-Pro score: 73.5% with zoom-in (company-reported)
  • Model sizes: 2B, 8B, 32B, 235B-A22B (sparse)
  • Released: December 29, 2025
  • License: Apache 2.0 (2B and 8B models only)
  • Base architecture: Qwen 3 VL
Tags:MAI-UIAlibabaGUI agentsQwensmartphone automationopen source AI
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Alibaba Releases MAI-UI, a Family of GUI Agents for Smartphone Control | aiHola