Also look at Vibe:

It even supports speaker differentiation/recognition and is open source on mac/windows/linux;

https://github.com/thewh1teagle/vibe

It uses whisper, but also directly calls other tools and puts everything under one nice Gui