Only for chat sessions, not for agentic coding. It's just too slow to be practical (10 minutes to answer a simple question about a 2k LoC project - and that's with a 5070 addon card).

This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.

I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.

Doesn't the framework desktop have a Ryzen 395 AI? That's a unified memory architecture like the Macs.

Ah, forgot to add, it's not really "unified" you have to explicitly specify your allocations. You may have a reasonably good 48gb chunk assigned to the GPU, but that DDR5 is 5-10 times slower than GDDR/HBM and the GPU itself isn't stellar.

So, framework laptops are great for chatting but nearly useless in agentic coding.

My Radeon W7900 answers a question ("what is this project") in 2 minutes, it takes my Framework 16 with 5070 addon around 11 minutes without the addon - around 23 (qwen 3.5 27b, claude code)

That's discrete DDR5, it's not as fast as your regular VRAM.