Hacker News

new | ask | show | jobs

bonoboTP 13 hours ago [ - ]

Vision and audio is already in use in multimodal LLMs. So it's possible in the past.

lelanthran 10 hours ago [ - ]

Who said anything about vision and audio?