Hacker News

it would be a really great option if it didn't lack vision

this is mcp or custom call to lowest cost model

someone did a webcam + agentic + capture of other computer bios/boot -> upload to image model -> back to agent

what do you use vision for? I have failed to find a workflow with it that makes sense, asking it to review screenshots of websites or whatever it misses extremely obvious details like text flowing out of it's container/overlapping other text, things being in entirely the wrong place, etc.

bckr 5 hours ago [ - ]

What models have you tried? Gemini 3.1 pro has vision capable of reading my sloppy diaries from 10 years ago, down to small glyphs and doodles.

RugnirViking 5 hours ago [ - ]

I mean they mostly work for OCR, I meant in a coding context.

cromka 10 hours ago [ - ]

For coding?