Hacker News

No one locally runs full load all day. The only way to see that is if you're training. We are talking about inference. I limit my GPU to 300watts. You can limit them down to 200w. Since everything is not in GPU and the bottleneck is between CPU/system ram. The GPUs don't even get to spike, I see 160w-180w for each GPU during inference. So redo your calculation again. Figure about 6 hrs of daily inference, and we are down to roughly $125 a year. Thanks again for your speculation.