what do you use vision for? I have failed to find a workflow with it that makes sense, asking it to review screenshots of websites or whatever it misses extremely obvious details like text flowing out of it's container/overlapping other text, things being in entirely the wrong place, etc.
this is mcp or custom call to lowest cost model
someone did a webcam + agentic + capture of other computer bios/boot -> upload to image model -> back to agent
what do you use vision for? I have failed to find a workflow with it that makes sense, asking it to review screenshots of websites or whatever it misses extremely obvious details like text flowing out of it's container/overlapping other text, things being in entirely the wrong place, etc.
What models have you tried? Gemini 3.1 pro has vision capable of reading my sloppy diaries from 10 years ago, down to small glyphs and doodles.
I mean they mostly work for OCR, I meant in a coding context.
For coding?