Have you tried Whisper itself? It's open-weights.

One of the features of the project posted above is "transformations" that you can run on transcripts. They feed the text into an LLM to clean it up. If you're willing to pay for the tokens, I think you could not only remove filler-words, but could probably even get the semantically-aware editing (corrections) you're talking about.

^Yep, unfortunately, the best option right now seems to pipe the output into another LLM to do some cleanup, which we try to help you do in Whispering. Recent transcription models don't have very good built-in inference/cleanup, with Whisper having the very weak "prompt" parameter. It seems like this is probably by design to keep these models lean/specialized/performant in their task.

By try to help, do you mean that it currently does so or that functionality is otw

[deleted]