Interesting. My Pixel 7 transcription is barely usable for me. Makes way too many mistakes and defeats the purpose of me not having to type, but maybe that's just my experience.
The latest open source local STT models people are running on devices are significantly more robust (e.g. whisper models, parakeet models, etc.). So background noise, mumbling, and/or just not having a perfect audio environment doesn't trip up the SoTA models as much (all of them still do get tripped up).
I work in voice AI and am using these models (both proprietary and local open source) every day. Night and day different for me.
I've built my own tts apps testing whisper and while it's good it does hallucinate quite a bit if there's noise, or just sometimes when the audio is perfectly clear.
It often gives the illusion of being very good but I could record a half hour of me speaking and discover some very random stuff in the middle that I did not say