Run with normal prompt > record neural activations
Run with ALL CAPS PROMPT > record neural activations
Then compare/diff them.
It does sound almost too simple to me too, but then lots of ML things sounds "but yeah of course, duh" once they've been "discovered", I guess that's the power of hindsight.
That's also reminiscent of neuroscience studies with fMRI where the methodology is basically
MRI during task - MRI during control = brain areas involved with the task
In fact it's effectively the same idea. I suppose in both cases the processes in the network are too complicated to usefully analyze directly, and yet the basic principles are simple enough that this comparative procedure gives useful information