I'm honestly confused why we can't determine how LLMs come to their decisions in the general sense. Is it not possible to log every step as the neural network / vector db / magic happens? Is it merely impractical, or is it actually something that's genuinely difficult to do?
My understanding is that it's neither impractical nor genuinely difficult, it's just that the "logging every step" approach provides explanations of their "reasoning" that are completely meaningless to us, as humans. It's like trying to understand why a person likes the color red, but not the color blue, using a database recording the position, makeup, and velocity of every atom in their brain. Theoretically, yes, that should be sufficient to explain their color preferences, in that it fully models their brain. But practically, the explanation would be phrased in terms of atomic configurations in a way that makes much less sense to us than "oh, this person likes red because they like roses".
>It's like trying to understand why a person likes the color red, but not the color blue, using a database recording the position, makeup, and velocity of every atom in their brain.
But this is an incredibly interesting problem!
Anthropic have done some great work on neural interpretability that gets at the core of this problem.
Everything happens in an opaque super-high-dimensional numerical space that was "organically grown" not engineered, so we don't really understand what's going on.
It would be like logging a bunch of random noise from anyone's perspective except the LLM's.
I guess I'm also just confused. I get that this is _difficult_ to do, but I would think that computer scientists would be utterly dissatisfied that AI was "non-deterministic" and would poke at the problem until it could be understood.
There's people doing both types. Look up survey of mechanistic interpretebility of language models and survey of explainable AI for neural networks. Those will give you many techniques for illustrating what's happening.
You'll also see why their applications are limited compared to what you probably hoped for.
Chat GPT-4 has alegedly 1.8 trillion parameters.
Imagine having a bunch of 2D matrices with a combined 1.8 trillion total numbers, from which you pick out a blocks of numbers in a loop and finally merge them and combine them to form a token.
Good luck figuring out what number represents what.
Wouldn't that mean it's totally impractical for day-to-day usage, but a researcher or team of researchers could solve this?
Anthropic has a tool that lets them do this but apparently doing it for even one prompt can take an entire day of work.
That’s so much faster than I expected