This is the state of the art for such a setup. Really good performance!
https://github.com/kvcache-ai/ktransformers