No system is foolproof. They'd have to be willing to throw out some % of good customers along with the bots. Amazon can do that because they have a monopoly already. Anthropic can't risk it when they're trying to grab market share.
No system is foolproof. They'd have to be willing to throw out some % of good customers along with the bots. Amazon can do that because they have a monopoly already. Anthropic can't risk it when they're trying to grab market share.
One would believe a model scoring this high on SWEBench could maximize F1 score for a precision recall problem easily. What's the missing part?
In this case, being distilled is sort of existential to them. The false positives would just be losing some revenue (depending if profitable, not even losing profit).