BitDance: 14B Autoregressive Image Model With Binary Tokens

Abstract visualization of binary tokens forming a high-resolution image through an autoregressive pipeline

A group of researchers led by ByteDance released BitDance today, a 14-billion-parameter autoregressive model for image generation that replaces conventional VQ codebooks with binary tokens. The model, weights, and code are all open under Apache 2.0.

Autoregressive image models have a reputation problem: slow inference, mediocre quality compared to diffusion, and tokenizers that lose too much detail. BitDance tries to fix all three at once. Its binary tokenizer uses group-wise lookup-free quantization to reach a vocabulary of 2256 possible tokens, far larger than the 65,536 used by Cosmos or 16,384 by LlamaGen. On reconstruction quality, the team reports a PSNR of 25.29 at 32x downsampling, compared to 24.81 for the continuous DC-AE tokenizer. Those are self-reported numbers from the technical paper, so treat them accordingly.

Sampling from a vocabulary that large is its own problem. A standard classification head would need more parameters than exist. BitDance sidesteps this with a diffusion head that models bits on a continuous hypercube using velocity matching, then snaps to discrete values at the end. For speed, the model predicts patches of 16 or 64 tokens in parallel rather than one at a time, using a block-wise causal mask that preserves spatial dependencies within each patch.

Two model variants ship on Hugging Face: BitDance-14B-16x and BitDance-14B-64x, both fine-tuned from Qwen3-14B-Base. The 64x variant claims over 30x speedup versus standard next-token AR, though that comparison is against vanilla token-by-token generation, not distilled diffusion models that already produce images in 4-8 steps. Max resolution is 1 megapixel. Full source code is on GitHub.

Bottom Line

BitDance ships two open-weight 14B models under Apache 2.0 that generate images at up to 64 tokens per step using a 2^256 binary vocabulary.

Quick Facts

14 billion parameters, fine-tuned from Qwen3-14B-Base
Binary tokenizer vocabulary: 2^256 (vs. 65,536 for Cosmos)
Two variants: 16-token and 64-token parallel prediction
PSNR 25.29 at 32x downsampling (self-reported)
License: Apache 2.0

Tags:BitDanceautoregressive image generationbinary tokensByteDancetext-to-imageopen sourcediffusion head

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

BitDance Drops 14B Autoregressive Image Model With Binary Tokens

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

LLM Checker Scans Your Hardware to Match Local AI Models

ByteDance Delays Seedance 2.0 Public Launch Amid Hollywood Legal Onslaught

Hugging Face Absorbs the Team Behind llama.cpp, Local AI's Most Important Project

Stay Ahead of the AI Curve