> In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.

They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.