Hacker News

dr_kiszonka 10 minutes ago [ - ]

Thank you for explaining. Do you think there are still opportunities for stack optimizations to meaningfully speed up inference on single consumer-grade GPUs?