Hacker News

Retric 4 days ago [ - ]

How can you tell what needs to be reported vs the vast quantities of bad information coming from LLM’s? Beyond that how exactly do you report it?

echelon 3 days ago [ - ]

Who even says customers (or even humans) are reporting it? (Though they could be one dimension of a multi-pronged system.)

Internal audit teams, CI, other models. There are probably lots of systems and muscles we'll develop for this.

astrange 4 days ago [ - ]

All LLM providers have a thumbs down button for this reason.

Although they don't necessarily look at any of the reports.

execveat 3 days ago [ - ]

The real world use cases for LLM poisoning is to attack places where those models are used via API on the backend, for data classification and fuzzy logic tasks (like a security incident prioritization in a SOC environment). There are no thumbs down buttons in the API and usually there's the opposite – promise of not using the customer data for training purposes.

astrange 3 days ago [ - ]

> There are no thumbs down buttons in the API and usually there's the opposite – promise of not using the customer data for training purposes.

They don't look at your chats unless you report them either. The equivalent would be an API to report a problem with a response.

But IIRC Anthropic has never used their user feedback at all.

Retric 3 days ago [ - ]

The question was where should users draw the line? Producing gibberish text is extremely noticeable and therefore not really a useful poisoning attack instead the goal is something less noticeable.

Meanwhile essentially 100% of lengthy LLM responses contain errors, so reporting any error is essentially the same thing as doing nothing.