Oh I wrote up a post on X on this exact question! https://x.com/danielhanchen/status/1979389893165060345?s=20

1. Cursor used online RL to get +28% approval rate: https://cursor.com/blog/tab-rl

2. Vercel used RFT for their AutoFix model for V0: https://vercel.com/blog/v0-composite-model-family

3. Perplexity's Sonar for Deep Research Reasoning I think was a finetuned model: https://docs.perplexity.ai/docs/getting-started/overview

4. Doordash uses LoRA, QLoRA for a "Generalized Attribute Extraction model" https://careersatdoordash.com/blog/unleashing-the-power-of-l...

5. NASA flood water detection https://earthdata.nasa.gov/news/nasa-ibm- openly-release-geospatial-ai-foundation-model-nasa-earth-observation-data6

6. Online RL for robotics - imagine you teaching a robot in the future via some mini finetuning

7. OpenAI's RFT page has more: https://developers.openai.com/api/docs/guides/rft-use-cases

8. For larger models - https://www.mercor.com/blog/expert-data-drives-model-perform...