I see your updated post. Switch over to llamacpp and look up recommended quants and settings. A good place for this info is on /r/localllama

Yep! I'm currently trying vllm, then I'll give llamacpp a try too