But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.
But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.