Hacker News

new | ask | show | jobs

mike_hearn 3 days ago [ - ]

But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.