Hacker News

new | ask | show | jobs

cma 5 months ago [ - ]

Some multimodal models may have a hidden captioning step that may take completion tokens, others work on a fully native representation, and some do both I think.