For a while, some people dismissed language models as “stochastic parrots”. They said models could just memorise statistical patterns, which they would regurgitate back to users.

The problem with this theory, is that, alas, it isn’t true.

If a language model was just a stochastic parrot, when we looked inside to see what was going on, we’d basically find a lookup table. … But it doesn’t look like this.

But does that matter? My understanding is that, if you don’t inject randomness (“heat”) into a model while it’s running, it will always produce the same output for the same input. In effect, a lookup table. The fancy stuff happening inside that the article describes is, in effect, [de]compression of the lookup table.

Of course, maybe that’s all human intelligence is too (the whole ‘free will is an illusion in a deterministic universe’ argument is all about this) - but just because the internals are fancy and complicated doesn’t mean it’s not a lookup table.

Everything can be represented as a lookup table. Well, at least everything we can rigorously reason about. Because set theory can serve as a foundation of mathematics. And relations there are sets of pairs (essentially lookup tables).

I guess it means that we can throw away the notion that "it can be represented as a lookup table" has some profound meaning. Without further clarifications, at least. Finite/infinite lookup table, can/can't be constructed in time polynomial in the number of entries. Things like that.