Anthropic is not a disinterested party here, and until their experiments can be replicated from an adversarial standpoint by people without a vested interest in hyping up the tech (i.e. one assuming the null hypothesis), I wouldn't consider them to be "good evidence".