Nice! Cheap RK3588 boards come with 15GB of LPDDR5 RAM these days and have significantly better performance than the Pi 5 (and often are cheaper).

I get 8.2 tokens per second on a random orange pi board with Qwen3-Coder-30B-A3B at Q3_K_XL (~12.9GB). I need to try two of them in parallel ... should be significantly faster than this even at Q6.

> a random orange pi board with Qwen3-Coder-30B-A3B at Q3_K_XL (~12.9GB)

fantastic! what are you using to run it, llama.cpp? I have a few extra opi5's sitting around that would love some extra usage

Yup! Build and ignore KleinAI and Vulkan etc. I’ve found that a clean CPU only build is optimal

Is that using the NPU on that board? I know it's possible to use those too.

It is possibly (superb subreddit) but painful to convert a modern model and takes ages for them to be supported. The NPU is energy efficient but no faster than CPU for generation (and has lousy software support).

I’m mostly interested in the NPu to run a vision head in parallel with an LLM to speed up time to first token with VLLMs (kinda want to turn them into privacy safe vision devices for consumer use cases)

Since my comment, I remembered I had a RK3588 board, a Rock 5B, and tried llama.cpp CPU over that, and performance was not amazing. But also I realized this is LPDDR4X, so don't get the cheapest RK3558 boards. My Orange Pi 5 is actually worse. This one has LPDDR4. Looking at the rest of Orange Pi's line-up, they don't actually have a board with both LPDDR5 and 32GB, only 16GB or LPDDR4(X).

Using llama-bench, and Llama 2 7B Q4_0 like https://github.com/ggml-org/llama.cpp/discussions/10879 how does yours compare? Cuz I'm also comparing it with a few a few Ryzen 5 3000 Series mini-pcs for less than 150$, and that gets 8 t/s on this list and I've gotten myself

With my Rock 5B and this bench, I get 3.65 t/s. On my Orange Pi 5 (not B) 8GB LPDDR4 (not X), I get 2.44 t/s.