I’d like to use this to transcribe meeting minutes with multiple people. How could this program work for that use case?

If your use-case is meeting, https://github.com/fastrepl/hyprnote is for you. OWhisper is more like a headless version of it.

Can you describe how it pick different voices? Does it need separate audio channels, or does it recognize different voices on the same audio input?

It separate mic/speaker as 2 channel. So you can reliably get "what you said" vs "what you heard".

For splitting speaker within channel, we need AI model to do that. It is not implemented yet, but I think we'll be in good shape somewhere in September.

Also we have transcript editor that you can easily split segment, assign speakers.

If you want to transcribe meeting notes, whisper isn't the best tool because it doesn't separate the transcribe by speakers. There are some other tools that do that, but I'm not sure what the best local option is. I've used Google's cloud STT with the diarization option and manually renamed "Speaker N" after the fact.