Yeah, I was a bit baffled by the author complaining about cache prefixes getting destroyed when more than one user hit the model, but then continuing to use llama.cpp instead of switching to vLLM.
Yeah, I was a bit baffled by the author complaining about cache prefixes getting destroyed when more than one user hit the model, but then continuing to use llama.cpp instead of switching to vLLM.