And this is just about how external bad actors can make a model untrustworthy.
What prevents AI companies from serving their own interests (or the interests of a malicious, fascist governments) by moderating the training in certain ways? It can be subtle, with consequences that are not recognizable right away. Didn't Musk already complained about Grok being "too woke"?
And how can I trust those companies with my own data?
I’m kind of shocked by how few are asking this question. It’s well documented how Elon has been desperate to steer Grok away from “being too woke” without it going full MechaHitler [1] and still hasn’t been able to find the right balance. Does this research point to a way he could get closer to that goal?
[1] https://youtu.be/r_9wkavYt4Y