Why 128GB?

At 80B, you could do 2 A6000s.

What device is 128gb?

AMD Strix Halo / Ryzen AI Max+ (in the Asus Flow Z13 13 inch "gaming" tablet as well as the Framework Desktop) has 128 GB of shared APU memory.

Not quite. They have 128GB of ram that can be allocated in the BIOS, up to 96GB to the GPU.

You don't have to statically allocate the VRAM in the BIOS. It can be dynamically allocated. Jeff Geerling found you can reliably use up to 108 GB [1].

[1]: https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...

allocation is irrelevant. as an owner of one of these you can absolutely use the full 128GB (minus OS overhead) for inference workloads

Care to go into a bit more on machine specs? I am interested in picking up a rig to do some LLM stuff and not sure where to get started. I also just need a new machine, mine is 8y-o (with some gaming gpu upgrades) at this point and It's That Time Again. No biggie tho, just curious what a good modern machine might look like.

Those Ryzen AI Max+ 395 systems are all more or less the same. For inference you want the one with 128GB soldered RAM. There are ones from Framework, Gmktec, Minisforum etc. Gmktec used to be the cheapest but with the rising RAM prices its Framework noe i think. You cant really upgrade/configure them. For benchmarks look into r/localllama - there are plenty.

Minisforum, Gmktec also have Ryzen AI HX 370 mini PCs with 128Gb (2x64Gb) max LPDDR5. It's dirt cheap, you can get one barebone with ~€750 on Amazon (the 395 similarly retails for ~€1k)... It should be fully supported in Ubuntu 25.04 or 25.10 with ROCm for iGPU inference (NPU isn't available ATM AFAIK), which is what I'd use it for. But I just don't know how the HX 370 compares to eg. the 395, iGPU-wise. I was thinking of getting one to run Lemonade, Qwen3-coder-next FP8, BTW... but I don't know how much RAM should I equip it with - shouldn't 96Gb be enough? Suggestions welcome!

I benchmarked unsloth/Qwen3-Coder-Next-GGUF using the MXFP4_MOE (43.7 GB) quantization on my Ryzen AI Max+ 395 and I got ~30 tps. According to [1] and [2], the AI Max+ 395 is 2.4x faster than the AI 9 HX 370 (laptop edition). Taking all that into account, the AI 9 HX 370 should get ~13 tps on this model. Make of that what you will.

[1]: https://community.frame.work/t/ai-9-hx-370-vs-ai-max-395/736...

[2]: https://community.frame.work/t/tracking-will-the-ai-max-395-...

Thanks! I'm... unimpressed.

The Ryzen 370 lacks the quad channel RAM. Stay away.

Ryzen AI HX 370 is not what you want, you need strix halo APU with unified memory

maxed out Framework Desktop

Keep in mind most of the Strix Halo machines are limited to 10Gbe networking at best.

you can use separate network adapter with RoCEv2/RDMA support like Intel E810

Most Ryzen 395 machines don't have a PCI-e slot for that so you're looking at an extension from an m.2 slot or Thunderbolt (not sure how well that will work, possibly ok at 10Gb). Minisforum has a couple newly announced products, and I think the Framework desktop's motherboard can do it if you put it in a different case, that's about it. Hopefully the next generation has Gen5 PCIe and a few more lanes.

Spark DGX and any A10 devices, strix halo with max memory config, several mac mini/mac studio configs, HP ZBook Ultra G1a, most servers

If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).

By A6000, do you mean the older Ampere generation model? 48 GB ddr6, released 2020 [1]. Can you even buy those new still?

[1] https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686

That's the maximum you can get for $3k-$4k with ryzen max+ 395 and apple studio Ms. They're cheaper than dedicated GPUs by far.

Mac Studios or Strix Halo. GPT-OSS 120b, Qwen3-Next, Step 3.5-Flash all work great on a M1 Ultra.

All the GB10-based devices -- DGX Spark, Dell Pro Max, etc.

Guess, it is mac m series