Hacker News

0xbadcafebee 6 hours ago [ - ]

Configure a subagent in your coding harness to spin up a new sub-session with any vision model for those tasks and feed the result back to the main model. No need for "one model that does everything"

ricardobeat 8 minutes ago [ - ]

That doesn’t work well in a lot of scenarios. The text LLM doesn’t know what to look for in an image before it sees a description, you might need multiple rounds of back and forth.

WASDx 4 hours ago [ - ]

Are you suggesting it should summarize the image in text or generate it in HTML or something else?