Hacker News

tcdent 5 hours ago [ - ]

Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request.

3 hours ago [ - ]

[deleted]