Precisely this! Too many confounding variables to look at such a surface level, a few high-level population wide stats. Nothing is that simple, nothing is clean in this sort of thing. Messy, interrelated factors, and you can chip away at the question bit by bit to reveal things but that is what it takes, not this do-it-with-vibes approach that's been around long before agents started taking prompts to code.
My hope is that agentic analysis that does this tedious methodical chipping away, comparative cross referencing of seemingly disparate datasets, will help shift society the tiniest bit away from law & policy making via hot takes that make even the well intentioned fall on their face with poor reasoning and the more cynical wield ambiguity a cudgel of control by any emotion they can incite, usually not the good ones.
Let’s see how it goes. I don’t think safety is properly being measured. Much like you can’t measure a beautiful property or a delicious dish. It has to be experienced.
Moreover, AI guardrails will interfere with you identifying any meaningful anthropological conclusions.