Hacker News

The purpose is to control the total amount of requests they need to handle in a given timeframe. If everyone could use up their whole weekly limit in 5 hours, many would do so, thus pushing the GPU/TPU clusters to or above their capacity limits.