Personally I'd start with llamafile [0] then move to compiling your own llama.cpp.

It's not as bad as you might think to compile llama.cpp for your target architecture and spin up an OpenAI compatible API endpoint. It even downloads the models for you.

[0]: https://github.com/mozilla-ai/llamafile