Liquid AI dropped LFM2.5-8B-A1B, a Mixture-of-Experts model that holds 8.3B total parameters but fires only about 1.5B per token. That sparsity is the whole point: it keeps the thing runnable on phones, laptops, and robots without a cloud connection. The release is detailed on the company's company blog, and the weights are already up on Hugging Face.
It's a reasoning-only model this time. Unlike the earlier LFM2-8B-A1B, it produces an explicit chain of thought before answering, a choice the team says works because each reasoning token stays cheap on a sparse MoE. The blog also bumps context from the previous 32,768 to 128,000 tokens and roughly triples pretraining to 38T tokens, with large-scale RL layered on top.
Liquid claims it's "comparable to models with up to 4x its size" on tool calling, though that's the vendor's own framing and benchmarks are largely self-reported so far. One reviewer noted the model stumbled on a basic greeting prompt in early community testing. An AIME 2026 evaluation result was added to the model card within hours of launch.
The base and post-trained checkpoints ship with day-one support for llama.cpp, MLX, vLLM, and SGLang, and fine-tuning fits on a single GPU. You can poke at it now in the Liquid Playground. Independent numbers on MMLU-Pro, IFEval, and BFCL are what will actually settle the size-class claims.
Bottom Line
LFM2.5-8B-A1B runs at 8B scale but activates only 1.5B parameters per token, with weights live on Hugging Face now.
Quick Facts
- 8.3B total parameters, 1.5B active per token
- 128K context window, up from 32,768
- 38T pretraining tokens plus large-scale RL
- 24 layers: 18 LIV convolution blocks, 6 GQA layers
- Tool-calling parity with 4x larger models is company-reported




