I built speech-swift, which focuses on on-device speech processing like VibeVoice, but specifically leverages Apple Silicon's capabilities for ASR, TTS, and VAD without cloud dependency. Our ASR supports 52 languages with a real-time factor of 0.06. https://soniqo.audio/benchmarks