Thanks for sharing! Transcription suddenly became useful to me when LLMs started being able to generate somewhat useful code from natural language. (I don't think anybody wants to dictate code.) Now my workflow is similar to yours.
I have mixed feelings about OS-integration. I'm currently working on a project to use a foot-pedal for push-to-transcribe - it speaks USB-HID so it works anywhere without software, and it doesn't clobber my clipboard. That said, an app like yours really opens up some cool possibilities! For example, in a keyboard-emulation strategy like mine, I can't easily adjust the text prompt/hint for the transcription model.
With an application running on the host though, you can inject relevant context/prompts/hints (either for transcription, or during your post-transformations). These might be provided intentionally by the user, or, if they really trust your app, this context could even be scraped from what's currently on-screen (or which files are currently being worked on).
Another thing I've thought about doing is using a separate keybind (or button/pedal) that appends the transcription directly to a running notes file. I often want to make a note to reference later, but which I don't need immediately. It's a little extra friction to have to actually have my notes file open in a window somewhere.
Will keep an eye on epicenter, appreciate the ethos.
Thank you for the support, and agreed on OS-level integration. At least for me, I have trouble trusting any app unless they are open source and have a transparent codebase for audit :)
If you want a rabbit hole to go down, looking into cursorless, talonvoice and that whole sphere.
Actually dictating code, but they do it in a rather smart way.