> It seems unlikely it’s just prioritization
Why does this seem unlikely? I have no doubt they are optimizing all the time, including inference speed, but why could this particular lever not entirely be driven by skipping the queue? It's an easy way to generate more money.
Yes it's 100% prioritization. Through that it's also likely running on more GPUs at once but that's an artifact of prioritization at the datacenter level. Any task coming into an AI datacenter atm is split into fairly fined grained chunks of work and added to queues to be processed.
When you add a job with high priority all those chunks will be processed off the queue first by each and every GPU that frees up. It probably leads to more parallelism but... it's the prioritization that led to this happening. It's better to think of this as prioritization of your job leading to the perf improvement.
Here's a good blog for anyone interested which talks about prioritization and job scheduling. It's not quite at the datacenter level but the concepts are the same. Basically everything is thought of as a pipeline. All training jobs are low pri (they take months to complete in any case), customer requests are mid pri and then there's options for high pri. Everything in an AI datacenter is thought of in terms of 'flow'. Are there any bottlenecks? Are the pipelines always full and the expensive hardware always 100% utilized? Are the queues backlogs big enough to ensure full utilization at every stage?
https://www.aleksagordic.com/blog/vllm
>Yes it's 100% prioritization
Amazon Bedrock has a similar feature called "priority tier": you get faster responses at 1.75x the price. And they explicitly say in the docs "priority requests receive preferential treatment in the processing queue, moving ahead of standard requests for faster responses".
Until everyone buys it. Like fast pass at an amusement park where the fast line is still two hours long
At 6x the cost, and it requiring you to pay full API pricing, I don’t think this is going to be a concern.
It's a good way to squeeze extra out of a bunch of people without actually raising prices.