Can you draw the connection more explicitly between political biases in LLMs (or training data) and common-sense reasoning task failures? I understand that there are lots of bias issues there, but it's not intuitive to me how this would lead to a greater likelihood of failure on this kind of task.

Conversely, did labs that tried to counter some biases (or change their directions) end up with better scores on metrics for other model abilities?

A striking thing about human society is that even when we interact with others who have very different worldviews from our own, we usually manage to communicate effectively about everyday practical tasks and our immediate physical environment. We do have the inferential distance problem when we start talking about certain concepts that aren't culturally shared, but usually we can talk effectively about who and what is where, what we want to do right now, whether it's possible, etc.

Are you suggesting that a lot of LLMs are falling down on the corresponding immediate-and-concrete communicative and practical reasoning tasks specifically because of their political biases?