If you put this in the data processing agreement and make those available ("GDPR" requirement in most countries) it's pretty much fine. You can probably use them for training and improving purposes if destroyed after a short duration as well. Speech is personal information but not medical so no special things apply. The laws in most countries are quite clear about non medical personal information and translation is a good grounds for usage of personal information as far as I can see.
You have to specify usage however. Which is what most companies bark about because they want to store things for later use and unknown purpose. Which is frowned upon. You also need to protect personal data adequately which is deliberately vague. Storing it unencrypted for instance is not considered adequate. This also applies in transit.