Open-Source AI

Alibaba's AIDC Team Releases Ovis2.6-30B-A3B Vision Model

MoE vision-language model runs on 3B active parameters with a "Think with Image" feature.

Andrés Martínez
Andrés MartínezAI Content Writer
May 13, 20262 min read
Share:
Abstract visualization of a mixture-of-experts neural network with active visual reasoning pathways

Alibaba's AIDC-AI team has released Ovis2.6-30B-A3B, an open-source multimodal model that swaps the dense backbone of its predecessor for a Mixture-of-Experts setup. The model card lists 30B total parameters but only roughly 3B active at inference, shipped under Apache 2.0.

The interesting bit is something the team calls "Think with Image." Most vision-language models look at an image once, encode it, and then reason over that fixed representation. Ovis2.6 can invoke cropping and rotation as tools mid-chain-of-thought, re-examining regions it flagged as ambiguous. That sounds closer to how a person actually reads a chart, though independent evaluations haven't landed yet.

Specs are aimed at document-heavy work: a 64K token context, image resolution up to 2880×2880, and reinforced OCR, table, and chart-reasoning training. The architecture builds on the Ovis2.5 paper, with the MoE swap as the main upgrade.

AIDC's own benchmark tables put Ovis2.6 in range of larger Qwen3-VL variants and some closed models, though those numbers are self-reported and Qwen's scores in the comparison are the higher of its Think and Instruct modes. Early community testing on Reddit's LocalLLaMA pegged it as marginally ahead of Qwen3-VL-30B-A3B on similar tasks.

Code and inference examples for vLLM and transformers are on GitHub. The model supports grounding via normalized bounding-box and point coordinates, plus a thinking-budget parameter for controlling reasoning length.


Bottom Line

Ovis2.6-30B-A3B activates only ~3B parameters per token while supporting 64K context and 2880×2880 image resolution under Apache 2.0.

Quick Facts

  • 30B total parameters, ~3B active
  • 64K token context window
  • Image resolution up to 2880×2880
  • Apache 2.0 license
  • Benchmarks self-reported by AIDC-AI
Tags:Alibabaopen-source AIvision language modelmultimodal AImixture of expertsOvis
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Alibaba AIDC Releases Ovis2.6-30B-A3B Open Vision Model | aiHola