LLMs & Foundation Models

AMD Opens Ryzen AI Halo Pre-Orders at $3,999 With 128GB Memory

AMD's Strix Halo mini PC for local AI hit pre-order at $3,999, undercutting Nvidia's DGX Spark by $700.

Oliver Senti
Oliver SentiSenior AI Editor
June 18, 20263 min read
Share:
Compact AMD Ryzen AI Halo mini PC on a desk running a local large language model

AMD opened pre-orders for the Ryzen AI Halo developer platform on June 12, a Strix Halo mini PC aimed at running large language models locally. Lisa Su first showed it at CES in January. The pitch is 128GB of unified memory in a box smaller than a hardback book, priced at $3,999 through Micro Center.

What you actually get

The chip is the Ryzen AI Max+ 395: 16 Zen 5 cores, Radeon 8060S graphics, and a 50 TOPS NPU that, per most local-AI testers, sits idle during normal LLM work. Every configuration ships with 128GB of LPDDR5X-8000 and a 2TB SSD. AMD's product page claims support for models up to 200 billion parameters, depending on quantization.

That memory number is the whole point. Because the CPU and GPU draw from the same pool, you can load a 70B model in full precision without sharding it across cards. No consumer discrete GPU does that. The RTX 5090 caps at 32GB.

So why not just buy one now?

You can. Mini PCs on the same Strix Halo silicon have been shipping for months. The GMKtec EVO-X2 with 96GB of RAM sells for around $2,349, and Framework's 128GB desktop has hovered near $1,999. AMD's first-party box costs more, and the company is betting the difference buys you preinstalled playbooks, ROCm support out of the gate, and dual Windows and Linux booting. The original source pricing this near $1,800 is optimistic; the real floor for a comparable 128GB config is closer to two grand.

The comparison AMD wants you to make is against Nvidia's DGX Spark, which now retails for $4,699. Spark launched at the same $3,999 last year before memory shortages pushed the price up. AMD undercuts it by $700 and adds native Windows. Nvidia counters with a more mature software stack, which for a lot of developers still settles the argument.

The catch nobody puts on the box

Unified memory bandwidth runs around 256 GB/s. Discrete VRAM clears 1,000. So while the Halo can hold a giant model that a 4090 physically can't, it churns through tokens slower. Benchmarks put 70B inference around 14 tokens per second. Fine for tinkering and overnight jobs. Painful if you expected snappy chat. Image generation lands roughly 3 to 4 times slower than a 4090. This is a capacity play, not a speed one, and AMD's own framing leans on memory size rather than throughput for a reason.

The Russian-language report that surfaced this also said AMD demoed Qwen3-235B on the device. I couldn't confirm that specific model in AMD's materials, which cite a different benchmark set, so treat it as unverified.

Pre-orders are live now at Micro Center for both the Linux and Windows editions at the same $3,999.99. No firm delivery date yet, though Micro Center listed local pickup for July 10.

Tags:AMDStrix HaloRyzen AI Max+ 395local AImini PCDGX Sparklarge language modelsunified memoryNvidiaROCm
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.