How does it compare with https://github.com/KoljaB/RealtimeVoiceChat , which is absent of the benchmark ?

That's not a turn-taking model, it's just a silence detection Python script based on whatever text comes out of Whisper...

I haven’t tried that one yet, I’ll check it out.