Same, I really like the ONNX format. I only wish that they weren't so frustratingly difficult to use on Apple iOS. Their browser engine, WebKit, has become annoyingly restrictive over the years in terms of the working memory cap.

I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.

Interesting, what use cases are you using onnx for btw?

So I use a VAD onnx (Silero [1]) to automatically detect when someone is talking, and then it sends the audio into one of the voice recognition libraries.

I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.

Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.

[1] - https://github.com/snakers4/silero-vad

[2] - https://shahkur.specr.net

[3] - https://github.com/k2-fsa/sherpa-onnx