Alibaba AIDC Releases Ovis2.6-30B-A3B Open Vision Model

Abstract visualization of a mixture-of-experts neural network with active visual reasoning pathways

Alibaba's AIDC-AI team has released Ovis2.6-30B-A3B, an open-source multimodal model that swaps the dense backbone of its predecessor for a Mixture-of-Experts setup. The model card lists 30B total parameters but only roughly 3B active at inference, shipped under Apache 2.0.

The interesting bit is something the team calls "Think with Image." Most vision-language models look at an image once, encode it, and then reason over that fixed representation. Ovis2.6 can invoke cropping and rotation as tools mid-chain-of-thought, re-examining regions it flagged as ambiguous. That sounds closer to how a person actually reads a chart, though independent evaluations haven't landed yet.

Specs are aimed at document-heavy work: a 64K token context, image resolution up to 2880×2880, and reinforced OCR, table, and chart-reasoning training. The architecture builds on the Ovis2.5 paper, with the MoE swap as the main upgrade.

AIDC's own benchmark tables put Ovis2.6 in range of larger Qwen3-VL variants and some closed models, though those numbers are self-reported and Qwen's scores in the comparison are the higher of its Think and Instruct modes. Early community testing on Reddit's LocalLLaMA pegged it as marginally ahead of Qwen3-VL-30B-A3B on similar tasks.

Code and inference examples for vLLM and transformers are on GitHub. The model supports grounding via normalized bounding-box and point coordinates, plus a thinking-budget parameter for controlling reasoning length.

Bottom Line

Ovis2.6-30B-A3B activates only ~3B parameters per token while supporting 64K context and 2880×2880 image resolution under Apache 2.0.

Quick Facts

30B total parameters, ~3B active
64K token context window
Image resolution up to 2880×2880
Apache 2.0 license
Benchmarks self-reported by AIDC-AI

Tags:Alibabaopen-source AIvision language modelmultimodal AImixture of expertsOvis

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Alibaba's AIDC Team Releases Ovis2.6-30B-A3B Vision Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Zyphra Releases 74B MoE Checkpoint Trained Entirely on AMD

Tencent Open-Sources 440MB Offline Translation Model for Phones

Perceptron AI Launches Mk1 Video Model at 80% Discount

Stay Ahead of the AI Curve