A group of researchers led by ByteDance released BitDance today, a 14-billion-parameter autoregressive model for image generation that replaces conventional VQ codebooks with binary tokens. The model, weights, and code are all open under Apache 2.0.
Autoregressive image models have a reputation problem: slow inference, mediocre quality compared to diffusion, and tokenizers that lose too much detail. BitDance tries to fix all three at once. Its binary tokenizer uses group-wise lookup-free quantization to reach a vocabulary of 2256 possible tokens, far larger than the 65,536 used by Cosmos or 16,384 by LlamaGen. On reconstruction quality, the team reports a PSNR of 25.29 at 32x downsampling, compared to 24.81 for the continuous DC-AE tokenizer. Those are self-reported numbers from the technical paper, so treat them accordingly.
Sampling from a vocabulary that large is its own problem. A standard classification head would need more parameters than exist. BitDance sidesteps this with a diffusion head that models bits on a continuous hypercube using velocity matching, then snaps to discrete values at the end. For speed, the model predicts patches of 16 or 64 tokens in parallel rather than one at a time, using a block-wise causal mask that preserves spatial dependencies within each patch.
Two model variants ship on Hugging Face: BitDance-14B-16x and BitDance-14B-64x, both fine-tuned from Qwen3-14B-Base. The 64x variant claims over 30x speedup versus standard next-token AR, though that comparison is against vanilla token-by-token generation, not distilled diffusion models that already produce images in 4-8 steps. Max resolution is 1 megapixel. Full source code is on GitHub.
Bottom Line
BitDance ships two open-weight 14B models under Apache 2.0 that generate images at up to 64 tokens per step using a 2^256 binary vocabulary.
Quick Facts
- 14 billion parameters, fine-tuned from Qwen3-14B-Base
- Binary tokenizer vocabulary: 2^256 (vs. 65,536 for Cosmos)
- Two variants: 16-token and 64-token parallel prediction
- PSNR 25.29 at 32x downsampling (self-reported)
- License: Apache 2.0




