Hacker News

Yeah, I was a bit baffled by the author complaining about cache prefixes getting destroyed when more than one user hit the model, but then continuing to use llama.cpp instead of switching to vLLM.