This is a great article, but it doesn't appear to model H100 downtime in the $2/hr costs. It assumes that OpenAI and Anthropic can match demand for inference to their supply of H100s perfectly, 24/7, in all regions. Maybe you could argue that the idle H100s are being used for model training - but that's different to the article's argument that inference is economically sustainable in isolation.
Not really, that is why they sell Batch API at considerably lower costs than the normal API.
There are also probably all kinds of enterprise deals that they are okay with high latency (> hours) that they do beyond the PAYG batch APIs