Hacker News

velcrovan a year ago [ - ]

Wouldn't we need several hundred gigabytes of ingestible/structured contextual info for animal vocalizations in order to train a model with any accuracy? Even if we had it, seems to me the model would be able to tell us what sounds probably “should” follow those of a given recording, but not what they mean.

lossolo a year ago [ - ]

We could train a transformer that could predict the next token, whether it's the next sound from one animal or a sound from another animal replying to it. However, we wouldn't understand the majority of what it means, except for the most obvious sounds that we could derive from context and observation of behavior. This wouldn't result in a ChatGPT-like interface, as it is impossible for us to translate most of these sounds into a meaningful conversation with animals.

visarga a year ago [ - ]

Why not label a fine-tuning dataset with human descriptions based on video recordings. We explain in human language what they do, and then tune the model. It doesn't need to be a very large dataset, but it would allow for models to directly translate to human language from bird calls.

lossolo a year ago [ - ]

What if they just sit and talk? What is the description of this? What if only part of the communication is relevant? What if it's not relevant at all because they reacted to atmospheric changes? Or electromagnetic signals, that can't be observed on video? Or smell? Or sound outside of human hearing frequency? What if the decision based on communication is deferred? etc etc

As I mentioned before, only the most obvious examples of behaviors and context can be translated into anything meaningful.

amelius a year ago [ - ]

But then it's not a translation of the bird tweets, but more like a predictive mapping from tweets to behaviors.

goatlover a year ago [ - ]

Reminds me of Wittgenstein's if a lion could speak, we would not understand it.

jampekka a year ago [ - ]

Something like this? https://search.acousticobservatory.org/