Liquid AI released LFM2.5-350M, a 350-million-parameter language model built for agentic workloads on devices too constrained for typical LLMs. The model runs under 1GB of memory and supports llama.cpp, MLX, and vLLM out of the box. Liquid recommends it specifically for data extraction, structured outputs, and tool use, not for knowledge-heavy tasks or coding.
The jump from LFM2 to LFM2.5 comes down to training scale. Pretraining went from 10 trillion to 28 trillion tokens, and Liquid added multi-stage reinforcement learning on top of supervised fine-tuning and preference alignment. On the inference side, the company reports 313 tokens per second decode on AMD CPUs and 188 tok/s on Snapdragon Gen4, though these are Liquid's own numbers. The technical paper for the underlying LFM2 architecture details the hybrid design: gated short convolutions handle most computation, with only about 20% relying on attention layers.
At 350M parameters, this sits well below the 1B-class models that dominate on-device conversations. Liquid's earlier LFM2-350M already competed with Qwen3-0.6B despite being smaller, according to the company's LFM2 blog post, though all benchmarks were run on Liquid's internal evaluation suite. The 2.5 update extends that foundation with RL-tuned instruction following. Weights are open under Liquid's custom license, which requires a separate commercial license above $10 million in annual revenue.
The practical pitch: run lightweight agent loops, document parsing, or function calling on phones, laptops, and IoT hardware without a cloud roundtrip. Pricing and API access weren't part of this release. Liquid also operates a developer platform called LEAP for fine-tuning and deployment.
Bottom Line
LFM2.5-350M fits agent-capable inference into sub-1GB memory by tripling its training data to 28T tokens and adding reinforcement learning to a hybrid conv-attention architecture.
Quick Facts
- 350M parameters, 32K context length
- 28T tokens pretraining (up from 10T in LFM2)
- 313 tok/s decode on AMD CPU, 188 tok/s on Snapdragon Gen4 (company-reported)
- Runs under 1GB memory with llama.cpp, MLX, vLLM support
- Open weights under LFM Open License v1.0 (commercial license required above $10M revenue)




