They might not now how whisper works. I suspect that the answer to their question is 'yes' and the reason they can't find a straightforward answer through your project is that the answer is so obvious to you that it's hardly worth documenting.
Whisper for transcription tries to transform audio data into LLM output. The transcripts generally have proper casing, punctuation and can usually stick to a specific domain based on the surrounding context.