This is the state of the art for such a setup. Really good performance!

https://github.com/kvcache-ai/ktransformers