>Aside: Are there any models for understanding voice to text, fully offline, without training?

OpenAI's whisper is a few years old and pretty solid.

https://github.com/openai/whisper

Whisper tends to fill silence with random garbage from its training set. [0] [1] [2]

[0]: https://github.com/openai/whisper/discussions/679 [1]: https://github.com/openai/whisper/discussions/928 [2]: https://github.com/openai/whisper/discussions/2608