Wait, so to get this straight, Anthropic knows:
1) LLMs are non-deterministic
2) This class of models has a particular tendency to "misbehave"
3) Their classifiers have a high rate of false positives
4) Millions of people give these models access to their machines
And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic?
I think this has a name. I think it may be called malware.
That is the perfect description. malware! What is sad is that there is no going back from this. Now that we know that they do this, I'll never believe they aren't doing it in other domains, or won't extend it to other domains in the future. This is probably the worst thing they could have possibly ever done for trust.
... that you pay to install on your machine.