Liquid AI LFM2.5-350M: Agent Model Under 1GB of Memory

Abstract visualization of a compact neural network architecture running on edge hardware devices

Liquid AI released LFM2.5-350M, a 350-million-parameter language model built for agentic workloads on devices too constrained for typical LLMs. The model runs under 1GB of memory and supports llama.cpp, MLX, and vLLM out of the box. Liquid recommends it specifically for data extraction, structured outputs, and tool use, not for knowledge-heavy tasks or coding.

The jump from LFM2 to LFM2.5 comes down to training scale. Pretraining went from 10 trillion to 28 trillion tokens, and Liquid added multi-stage reinforcement learning on top of supervised fine-tuning and preference alignment. On the inference side, the company reports 313 tokens per second decode on AMD CPUs and 188 tok/s on Snapdragon Gen4, though these are Liquid's own numbers. The technical paper for the underlying LFM2 architecture details the hybrid design: gated short convolutions handle most computation, with only about 20% relying on attention layers.

At 350M parameters, this sits well below the 1B-class models that dominate on-device conversations. Liquid's earlier LFM2-350M already competed with Qwen3-0.6B despite being smaller, according to the company's LFM2 blog post, though all benchmarks were run on Liquid's internal evaluation suite. The 2.5 update extends that foundation with RL-tuned instruction following. Weights are open under Liquid's custom license, which requires a separate commercial license above $10 million in annual revenue.

The practical pitch: run lightweight agent loops, document parsing, or function calling on phones, laptops, and IoT hardware without a cloud roundtrip. Pricing and API access weren't part of this release. Liquid also operates a developer platform called LEAP for fine-tuning and deployment.

Bottom Line

LFM2.5-350M fits agent-capable inference into sub-1GB memory by tripling its training data to 28T tokens and adding reinforcement learning to a hybrid conv-attention architecture.

Quick Facts

350M parameters, 32K context length
28T tokens pretraining (up from 10T in LFM2)
313 tok/s decode on AMD CPU, 188 tok/s on Snapdragon Gen4 (company-reported)
Runs under 1GB memory with llama.cpp, MLX, vLLM support
Open weights under LFM Open License v1.0 (commercial license required above $10M revenue)

Tags:Liquid AILFM2.5edge AIsmall language modelson-device AIagentic AIopen weights

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Liquid AI Ships 350M-Parameter Agent Model Trained on 28T Tokens

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Microsoft Open-Sources Harrier, a Multilingual Embedding Family That Tops MTEB v2

World Labs Ships Marble 1.1 With Auto-Expanding 3D Worlds

Google Releases Gemma 4 Open Models in Four Sizes

Stay Ahead of the AI Curve