Just Text to speech seems like its largely solved on pretty much every compute platform. However I have found a huge gap going from independent words being transcribed, to formatted text ready for an editor, or further processing.

If you look at how authors dictate they works ( which they have done for millennia), just getting the words written down is only the first step, and its by far the easiest. I have been helping build a tool https://bookscribe.ai that not only does the transcription, but then can post process it to make it actually usable for longer form content.

Aqua Voice does (at least some of) that as well.