I think it really depends on the legal definition of recording or what it's used for.
Common sense says that a recording that only exists for a few seconds, and is utilized only by the person a speaker is intending to speak to, and is never permanently stored, should be fine. And we can assume Apple has made sure this is legal in its home state of California.
But EU law might not have sufficient legal clarity on this if it was written in a particularly open-ended way.
If you put this in the data processing agreement and make those available ("GDPR" requirement in most countries) it's pretty much fine. You can probably use them for training and improving purposes if destroyed after a short duration as well. Speech is personal information but not medical so no special things apply. The laws in most countries are quite clear about non medical personal information and translation is a good grounds for usage of personal information as far as I can see.
You have to specify usage however. Which is what most companies bark about because they want to store things for later use and unknown purpose. Which is frowned upon. You also need to protect personal data adequately which is deliberately vague. Storing it unencrypted for instance is not considered adequate. This also applies in transit.