That's really interesting about medium being better than large. I never bothered trying the smaller models since the big ones were fast enough.

Benchmarks definitely say otherwise, but my anecdotal experience says medium is the best for this application with my voice and microphone