Interesting that their first product is an infrastructure play. Is it really so hard to set up a fine-tuning pipeline for yourself that a $12 billion startup with unlimited hype needs to be offering it? Maybe they have figured, whether correctly or not, that building AI tooling is going to be more lucrative than the AI itself.
The thing about this that’s interesting to me is that it can be used as a foundation for products they or other people make that combine real time RL rewards and fine tuning to improve the model. I see a lot of potential here compared to the standard paradigm of ChatGPT wrappers that involve tweaking the prompt or harness to improve it, which is a lot more constrained.
OpenAI has had a fine-tuning API since GPT-3.5, and a reinforcement fine-tuning API since last year.