AMD opened pre-orders for the Ryzen AI Halo developer platform on June 12, a Strix Halo mini PC aimed at running large language models locally. Lisa Su first showed it at CES in January. The pitch is 128GB of unified memory in a box smaller than a hardback book, priced at $3,999 through Micro Center.
What you actually get
The chip is the Ryzen AI Max+ 395: 16 Zen 5 cores, Radeon 8060S graphics, and a 50 TOPS NPU that, per most local-AI testers, sits idle during normal LLM work. Every configuration ships with 128GB of LPDDR5X-8000 and a 2TB SSD. AMD's product page claims support for models up to 200 billion parameters, depending on quantization.
That memory number is the whole point. Because the CPU and GPU draw from the same pool, you can load a 70B model in full precision without sharding it across cards. No consumer discrete GPU does that. The RTX 5090 caps at 32GB.
So why not just buy one now?
You can. Mini PCs on the same Strix Halo silicon have been shipping for months. The GMKtec EVO-X2 with 96GB of RAM sells for around $2,349, and Framework's 128GB desktop has hovered near $1,999. AMD's first-party box costs more, and the company is betting the difference buys you preinstalled playbooks, ROCm support out of the gate, and dual Windows and Linux booting. The original source pricing this near $1,800 is optimistic; the real floor for a comparable 128GB config is closer to two grand.
The comparison AMD wants you to make is against Nvidia's DGX Spark, which now retails for $4,699. Spark launched at the same $3,999 last year before memory shortages pushed the price up. AMD undercuts it by $700 and adds native Windows. Nvidia counters with a more mature software stack, which for a lot of developers still settles the argument.
The catch nobody puts on the box
Unified memory bandwidth runs around 256 GB/s. Discrete VRAM clears 1,000. So while the Halo can hold a giant model that a 4090 physically can't, it churns through tokens slower. Benchmarks put 70B inference around 14 tokens per second. Fine for tinkering and overnight jobs. Painful if you expected snappy chat. Image generation lands roughly 3 to 4 times slower than a 4090. This is a capacity play, not a speed one, and AMD's own framing leans on memory size rather than throughput for a reason.
The Russian-language report that surfaced this also said AMD demoed Qwen3-235B on the device. I couldn't confirm that specific model in AMD's materials, which cite a different benchmark set, so treat it as unverified.
Pre-orders are live now at Micro Center for both the Linux and Windows editions at the same $3,999.99. No firm delivery date yet, though Micro Center listed local pickup for July 10.




