Researchers at Huazhong University of Science and Technology and VIVO AI Lab released Moebius, an image inpainting framework that fills removed or masked regions of a photo. The catch is the size: 226 million parameters, against the 11.9 billion of Black Forest Labs' FLUX.1-Fill-Dev. The team published a technical paper and put code on GitHub.
Across six benchmarks spanning natural scenes (Places2) and faces (CelebA-HQ, FFHQ), the authors report Moebius rivals FLUX.1-Fill-Dev and beats it on complex textures and facial plausibility. All self-reported, and the paper concedes large masks and high-resolution crops still need independent testing. On the paper's own FFHQ numbers, it claims FID gains of 37% to over 1200% against the bigger industrial models, which is a wide enough spread to invite skepticism.
Inference runs at 26ms per step, for a total speedup the team pegs at more than 15x over 10B-class systems. Two design choices carry it: a block that condenses spatial and semantic context into fixed-size matrices, and distillation from a larger teacher model, PixelHacker, done in latent space.
The technical report hit number one on Hugging Face's daily papers ranking, and the project is accepted at ECCV 2026. Weights are live on Hugging Face: a base checkpoint plus versions fine-tuned on Places2, CelebA-HQ, and FFHQ. VIVO, the lab's parent, makes smartphones, so the edge-deployment angle isn't incidental.
Bottom Line
Moebius runs image inpainting with 226M parameters at 26ms per step, claiming parity with FLUX.1-Fill-Dev's 11.9B.
Quick Facts
- 226 million (0.22B) parameters
- FLUX.1-Fill-Dev comparison point: 11.9 billion parameters
- 26.01 ms per step inference latency (company-reported)
- >15x total inference speedup, self-reported
- Accepted at ECCV 2026
- Evaluated on 6 benchmarks: Places2, CelebA-HQ, FFHQ




