I don't think it's a stretch that you can train/align a model to avoid "hatespeech" or other topics deemed $Unacceptable you can align a model to favor a certain ideological viewpoint and have that alignment subtly influence the output.
How do most Chinese models handle Tienanmen square or discussions on Han superiority?
Oh sure, no one said you can't train a model to do this. You certainly can.
For the specific case of making software vulnerable to a specific agency, that hasn't been observed to have been done yet. Not because it can't be, but because no one has for now.
If it were done, it would be easy(ish) to detect, since it'll be reproducible.
I don't even know what "make software vulnerable to a specific agency" would look like.
Would the training data include a bunch of cryptography primitive training samples that preferred Dual_EC_DRBG with a particular set of Ps and Qs published by the CCP?
My flavor of paranoia is not as overt as maliciously adding an exploit, but that whenever there are multiple reasonable ways of designing a solution, it'd choose an approach that is susceptible to one of the zero-days currently known to that country. I don't see how reproducibility would help you there.
> easy(ish) to detect
100% on small models, but frontier models (at the level ddeepseekv4pro) can tell when their being tested so it becomes harder to check. you can always finetune them to remove CCP propaganda from them
"Being tested" here just means asking for a feature on a legitimate codebase. The larger models don't magically know the user's ulterior motives.
> How do most Chinese models handle Tienanmen square or discussions on Han superiority?
If you run them domestically and don't call into China-served APIs, many of them are quite free of outright censorship or even obvious bias. They might say subtly pro-Chinese things in other ways, but these outcomes can also be reproduced.