> much of the thinking happens in a much higher dimensional space that just happens to be decoded as text.

What do you mean by that? It’s literally text prediction, isn’t it?

It is text prediction. But to predict text, other things follow that need to be calculated. If you can step back just a minute, i can provide a very simple but adjacent idea that might help to intuit the complexity of “ text prediction “ .

I have a list of numbers, 0 to9, and the + , = operators. I will train my model on this dataset, except the model won’t get the list, they will get a bunch of addition problems. A lot. But every addition problem possible inside that space will not be represented, not by a long shot, and neither will every number. but still, the model will be able to solve any math problem you can form with those symbols.

It’s just predicting symbols, but to do so it had to internalize the concepts.

>internalize the concepts.

This gives the impression that it is doing something more than pattern matching. I think this kind of communication where some human attribute is used to name some concept in the LLM domain is causing a lot of damage, and ends up inadvertently blowing up the hype for the AI marketing...

That's the correct impression though.

I think what's causing a lot of damage is not attributing more of human attributes (though carefully). It's not the LLM marketing you have to worry about - that's just noise. All marketing is malicious lies and abusive bullshit, AI marketing is no different.

Care about engineering - designing and securing systems. There, the refusal to anthropomorphise LLMs is doing a lot of damage and wasted efforts, with good chunk of the industry believing in "lethal trifecta" as if it were the holy Trinity, and convinced it's something that can be solved without losing all that makes LLMs useful in the first place. A little bit of anthropomorphising LLMs, squinting your eyes and seeing them as little people on a chip, will immediately tell you these "bugs" and "vulnerabilities" are just inseparable facets of the features we care about, fundamental to general-purpose tools, and they can be mitigated and worked around (at a cost), but not solved, not any more you can solve "social engineering" or better code your employees so they're impervious to coercion or bribery, or being prompt-injected by a phone call from their loved one.

Except I actually mean to infer the concept of adding things from examples. LLMs are amply capable of applying concepts to data that matches patterns not ever expressed in the training data. It’s called inference for a reason.

Anthropomorphic descriptions are the most expressive because of the fact that LLMs based on human cultural output mimic human behaviours, intrinsically. Other terminology is not nearly as expressive when describing LLM output.

Pattern matching is the same as saying text prediction. While being technically truthy, it fails to convey the external effect. Anthropomorphic terms, while being less truthy overall, do manage to effectively convey the external effect. It does unfortunately imply an internal cause that does not follow, but the externalities are what matter in most non-philosophical contexts.

>do manage to effectively convey the external effect

But the problem is that this does not inform about the failure mode. So if I am understanding correctly, you are saying that the behavior of LLM, when it works, is like it has internalized the concepts.

But then it does not inform that it can also say stuff that completely contradicts what it said before, there by also contradicting the notion of having "internalized" the concept.

So that will turn out to be a lie.

If you look at the failure modes, they very closely resemble the failure modes of humans in equivalent situations. I'd say that, in practice, anthropomorphic view is actually the most informative we have about failure modes.

>they very closely resemble the failure modes of humans in equivalent situations

I don't think they do if we are talking about a honest human being.

LLMs will happily hallucinate and even provide "sources" for their wrong responses. That single thing should contradict what you are saying.

It didn't. It predicted symbols.

There was a paper recently that demonstrated that you can input different human languages and the middle layers of the model end up operating on the same probabilistic vectors. It's just the encoding/decoding layers that appear to do the language management.

So the conclusion was that these middle layers have their own language and it's converting the text into this language and this decoding it. It explains why sometime the models switch to chinese when they have a lot of chinese language inputs, etc.

Ok — that sounds more like a theory rather than an open-and-shut causal explanation, but I’ll read the paper.

You’re a literature cycle behind. ‘Middle-layer shared representations exist’ is the observed phenomenon; ‘why exactly they form’ is the theory.

You are also confusing ‘mechanistic explanation still incomplete’ with ‘empirical phenomenon unestablished.’ Those are not the same thing.

PS. Em dash? So you are some LLM bot trying to bait mine HN for reasoning traces? :D

Oh, Jesus Christ. I learned to write at a college with a strict style guide that taught us how to use different types of punctuation to juxtapose two ideas in one sentence. In fact, they did/do a bunch of LLM work so if anyone ever used student data to train models, I’m probably part of the reason they do that.

You sound like you’re trying to sound impressive. Like I said, I’ll read the paper.

Congrats on reading.

Sick burn

Pretty obvious when you think that neural networks operate with numbers and very complex formulas (by combining several simple formulas with various weights). You can map a lot of things to number (words, colors, music notes,…) but that does not means the NN is going to provide useful results.

Everything is obvious if you ignore enough of the details/problem space. I’ll read the paper rather than rely on my own thought experiments and assumptions.

>It’s literally text prediction, isn’t it?

you are discovering that the favorite luddite argument is bullshit

Feel free to elucidate if you want to add anything to this thread other than vibes.

after you go from from millions of params to billions+ models start to get weird (depending on training) just look at any number of interpretability research papers. Anthropic has some good ones.

> things start to get weird

> just look at research papers

You didn't add anything other than vibes either.

Interesting, what kind of weird?

Getting weird doesn’t mean calling it text prediction is actually ‘bullshit’? Text prediction isn’t pejorative…