Image Generation

BitDance Drops 14B Autoregressive Image Model With Binary Tokens

ByteDance-backed team open-sources a 14B AR image model generating 64 tokens per step.

Andrés Martínez
Andrés MartínezAI Content Writer
February 17, 20262 min read
Share:
Abstract visualization of binary tokens forming a high-resolution image through an autoregressive pipeline

A group of researchers led by ByteDance released BitDance today, a 14-billion-parameter autoregressive model for image generation that replaces conventional VQ codebooks with binary tokens. The model, weights, and code are all open under Apache 2.0.

Autoregressive image models have a reputation problem: slow inference, mediocre quality compared to diffusion, and tokenizers that lose too much detail. BitDance tries to fix all three at once. Its binary tokenizer uses group-wise lookup-free quantization to reach a vocabulary of 2256 possible tokens, far larger than the 65,536 used by Cosmos or 16,384 by LlamaGen. On reconstruction quality, the team reports a PSNR of 25.29 at 32x downsampling, compared to 24.81 for the continuous DC-AE tokenizer. Those are self-reported numbers from the technical paper, so treat them accordingly.

Sampling from a vocabulary that large is its own problem. A standard classification head would need more parameters than exist. BitDance sidesteps this with a diffusion head that models bits on a continuous hypercube using velocity matching, then snaps to discrete values at the end. For speed, the model predicts patches of 16 or 64 tokens in parallel rather than one at a time, using a block-wise causal mask that preserves spatial dependencies within each patch.

Two model variants ship on Hugging Face: BitDance-14B-16x and BitDance-14B-64x, both fine-tuned from Qwen3-14B-Base. The 64x variant claims over 30x speedup versus standard next-token AR, though that comparison is against vanilla token-by-token generation, not distilled diffusion models that already produce images in 4-8 steps. Max resolution is 1 megapixel. Full source code is on GitHub.


Bottom Line

BitDance ships two open-weight 14B models under Apache 2.0 that generate images at up to 64 tokens per step using a 2^256 binary vocabulary.

Quick Facts

  • 14 billion parameters, fine-tuned from Qwen3-14B-Base
  • Binary tokenizer vocabulary: 2^256 (vs. 65,536 for Cosmos)
  • Two variants: 16-token and 64-token parallel prediction
  • PSNR 25.29 at 32x downsampling (self-reported)
  • License: Apache 2.0
Tags:BitDanceautoregressive image generationbinary tokensByteDancetext-to-imageopen sourcediffusion head
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

BitDance: 14B Autoregressive Image Model With Binary Tokens | aiHola