I find OpenAI's speech-to-text model the best of the lot. It can handle my & my 5-year old daughter's Indian accent pretty well.
I wonder if they run the STT model's output through the current model (that we're chatting with) as a final pass - since the text seem to be well aligned to the current conversation context.
For long prompts, I often speak to OAI web/app and copy-paste the text to Claude / Gemini :)