It's linking llama.cpp and using Metal, so I presume GPU/CPU only.

I'm more than a bit overwhelmed with what I've gotten on my plate and have completely missed the boat on ex. understanding what MLX is, really curious for a thought dump if you have some opinionated experience/thoughts here. (ex. never crossed my mind until now that you might get better results on the NPU than GPU)

LMstudio seems to have MLX support on Apple silicon so you could quickly have a feel for whether it helps in your case https://github.com/lmstudio-ai/mlx-engine