I agree. They're voluntarily adding fingerprints to images so I expect the default voice is intentional and it wouldn't surprise me at all (though I have no evidence of this) if the output text has a fingerprint stenographically embedded in it.