Yes, and OpenAI has enough financial resources to do a bespoke abstraction layer with multiple provider-specific performant implementations.
Regardless of whether they bring in the Kubernetes complexity.
(Internal codename: Goober Yetis.)
Yes, and OpenAI has enough financial resources to do a bespoke abstraction layer with multiple provider-specific performant implementations.
Regardless of whether they bring in the Kubernetes complexity.
(Internal codename: Goober Yetis.)
Why would they create a bespoke abstraction layer instead of just relying on k8s?
There is only pain on the path of recreating it, it will end up almost as complex as k8s and it will be hell to hire and train for. Best to just use something battle-tested that works with a large pool of people trained for it, even better: their own LLM has gobbled up all the content possible about k8s to help their engineers. K8s complexity came to be for reasons discovered during growing the stack which anyone doing a bespoke similar system might run into, and it's pretty modular since you can pick-and-choose the parts you actually need for your cluster.
Wasting manpower to recreate a bespoke Kubernetes doesn't sound great for a company burning billions per quarter, it's just more waste.
I tried to expressly say that Kubernetes could be used.
And, given their unusual needs and scale, there will probably be some kind of bespoke abstraction, whether it's an SDK, or a document, that says this is the subset of things you should be using, and how to use them, so that we can practically deploy our very unusual setup with different providers and customer facilities.
OpenAI has the resources to define that abstraction, and make it work well across multiple providers.
> And, given their unusual needs and scale, there will probably be some kind of bespoke abstraction, whether it's an SDK, or a document, that says this is the subset of things you should be using, and how to use them, so that we can practically deploy our very unusual setup with different providers and customer facilities.
Now I understand what you meant, I wouldn't have classified this as a bespoke solution since it's more of an operational guide and tooling, most places I worked at running k8s had something similar (a tech radar/best practices of sorts). When I read "bespoke" I thought of "custom built" which I don't think would be the right approach, running k8s and customising the control plane is quite common.
I trust that OpenAI engineers are smart enough to build a replacement Kubernetes. I also think they are smart enough to build better, more innovative desks, tables & chair. But both of those wont be a good use of their time.
There's a reason why company vision & mission exists. OpenAI's mission is not to build next k8s, but to build better AI models.