Presumably anyone with a multimodal transformer already pretrained on Human data could be further pretrained on animal vocalizations. I don't know whether any of the large model owners are doing this.