ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.