>It's like trying to understand why a person likes the color red, but not the color blue, using a database recording the position, makeup, and velocity of every atom in their brain.

But this is an incredibly interesting problem!

Anthropic have done some great work on neural interpretability that gets at the core of this problem.