StepFun Step 3.7 Flash: Open-Weight 198B Agentic MoE Model

Abstract visualization of a distributed mixture-of-experts neural network routing tokens between specialized nodes

StepFun shipped Step 3.7 Flash, a 198B sparse Mixture-of-Experts vision-language model released under Apache 2.0. The model card pegs it at roughly 11B active parameters per token, a 256K context window, and three selectable reasoning levels. Throughput tops out at 400 tokens per second.

The pitch is agent efficiency, not raw size. StepFun built it to parse documents, charts, and UIs, then write code or call tools without the toolcall drift that breaks long agent loops. The company claims 98%+ on the tau-2 benchmark across all difficulty tiers, plus #1 placements on ClawEval-1.1 (67.1) and SimpleVQA Search (79.2). All self-reported. Independent runs on standard multimodal evals like MMMU and MATH-Vision haven't surfaced yet.

Weights are live on GitHub alongside GGUF quants for local hosting. With 128GB of unified memory, it runs on a Mac Studio, DGX Station, or AMD Ryzen AI Max+ 395 box. Inference works through vLLM, SGLang, llama.cpp, and Hugging Face Transformers.

API access is open now on StepFun's open platform, with OpenRouter and NVIDIA NIM also listed. One reporting outlet put pricing at $0.20 input and $1.15 output per million tokens, undercutting Western mid-tier models, though StepFun's own docs don't headline those figures. DeepInfra, Fireworks AI, and Modal are slated to add the model soon.

Bottom Line

Step 3.7 Flash ships as open weights under Apache 2.0 with ~11B active parameters and a 256K context, runnable on a 128GB Mac Studio.

Quick Facts

198B total parameters, ~11B active per token
Up to 400 tokens per second
256K context window, three reasoning levels
98%+ on tau-2 benchmark (company-reported)
Runs locally on 128GB unified memory devices

Tags:StepFunopen-weight modelsmixture-of-expertsAI agentsmultimodal AIApache 2.0Step 3.7 Flash

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

StepFun Releases Open-Weight Step 3.7 Flash for Agentic Work

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Xiaomi Cuts MiMo-V2.5 API Prices by Up to 99%

PrismML Shrinks a 4B Image Model to Run on iPhones

DeepSeek Sparse Attention Gets a From-Scratch Implementation Built for Reading

Stay Ahead of the AI Curve