Hacker News

Companies are going to install an AI inside their own proprietary systems full of proprietary and confidential data and PII about their customers and prospects and whatnot, and let it run around and click on random buttons and submit random forms?

Really??!? What could possibly go wrong.

I'm currently trying to do a large ORC project using Google Vision API, and then Gemini 1.5 Pro 002 to parse and reconstruct the results (taking advantage, one hopes, of its big context window). As I'm not familiar with Google Vision API I asked Gemini to guide me in setting it up.

Gemini is the latest Google model; Vision, as the name implies, is also from Google. Yet Gemini makes several egregious mistakes about Vision, gets names of fields or options wrong, etc.

Gemini 1.5 "Pro" also suggests that concatenating two json strings produces a valid json string; when told that's unlikely, it's very sorry and makes lots of apologies, but still it made the mistake in the first place.

LLMs can be useful when used with caution; letting one loose in an enterprise environment doesn't feel safe, or sane.

ddalex a year ago [ - ]

LLMs can't reason, or can't reason logically to be precise; what they are really good at is recalling.

So if you want accurate results on writing code you need to put all the docs into the input and THEN ask for your question. So download all docs on Vision, put them in the Gemini prompt and ask your question or code on how to use Vision, and you'll get much closer to truth

miyuru a year ago [ - ]

Have you tried any others? From what I have tried Gemini makes the most mistakes out of all.

bambax a year ago [ - ]

I have tried many others for many other things (via OpenRouter) but I have never compared LLMs on the exact same task; it's confusing enough with one engine... ;-)

Sonnet 3.5 for coding is fine but makes "basic" mistakes all the time. Using LLMs is at times like dealing with a senior expert suffering from dementia: it has arcane knowledge of a lot of things but suddenly misses the obvious that would not escape an intern. It's weird, really.