I see your point. I thought the first one was already known when deepseek came out. Perplexity team showed how they removed this kind of bias via finetuning and their finetune could answer sensitive questions. I mistakenly thought you went for the second since that part is new and interesting.
I definitely need help with the second part. It is a much harder claim to verify or dismiss. I also want to stress (as I do in several other comments) that this could be done even without sleeper agents (see Anthropic paper) but just with censoring.
What I want to fight the most is just outright dismissing what is at least partially testable. We're a community of techies, so shouldn't we be trying to verify or disprove the claims? I'm asking for help with that because the stronger claim is harder to conclude. We have no chance of figuring out the why, but hopefully we can avoid more disinformation. I just want us to stop arguing out our asses and fighting over things we don't know the answers to. I want to find the answers, because I don't know what they are.