Support for this landed in llama.cpp recently if anyone is interested in running it locally.