To be clear - as of today, many researchers would agree that language is still a uniquely human phenomenon. They discuss this pretty explicitly in the article linked, how it is important to draw a distinction between language and communication. There are no non-human species that have been found to use language for the Chomskian definition of language (using a finite set of symbols to represent an infinite number of communicable meanings).

However, this "dogma" as you call it is beginning to be weakened as researchers document more nuance and complexity in non-human communication than ever before, and so some researchers begin to say, "maybe we shouldn't have this all-or-nothing view of language". But it is simply not true that researchers are suppressing evidence of language in animals out of a desire to enslave and torture them.

> There are no non-human species that have been found to use language for the Chomskian definition of language (using a finite set of symbols to represent an infinite number of communicable meanings)

It's far from clear whether humans are capable of the Chomskian criteria of language. And Chomskian linguistics have more or less collapsed with the huge success of statistical methods.

Chomsky's poverty of stimulus argument is, if anything, strengthened by LLMs. You need to read the entire internet to make statistical methods work at producing grammatical texts. Children don't read the entire internet but do produce grammatical texts. Therefore &c. QED.

I think this is greatly complicated by the fact that the human brain has been "pre-trained" (in the deep learning sense) by hundreds of millions of years of evolution.

A pre-trained LLM also can also learn new concepts from extremely few examples. Humans may still be much smarter but I think there's a lot of reason to believe that the mechanics are similar.

The poverty of the stimulus (POS) argument is that "evolutionary pre-training" in the form (recursive) grammar is fundamentally required and can not be inferred from the stimulus.

The argument is based on multiple questionable assumptions of Chomskian linguistics:

- Humans actually learn grammar in the Chomskian way - Syntax is separate from semantics, so only language (utterances) can be learned from uttrances, and not e.g. what is seen in the environment - At least in the Gold's formalization of the argument language is learned only from "positive examples", so e.g. the learner can't observe that some does not understand some utterance

One could argue for a (very) weak form of POS that there has to be some kind of "inductive bias" in the learning system, but this applies to all learning as shown by Kant. The inductive bias can be very generic.

>At least in the Gold's formalization

It seems to be a persistent myth (possibly revived more recently due to Norvig?) that Chomsky's POS argument has some interesting connection to Gold's theorem. The two things have only a very loose logical connection (Gold's theorem is in no sense a formalization of any claim of Chomsky's), and Chomsky himself never based any of his arguments for innateness on Gold's theorem. Here is a secondary source making the same point (search for 'Gold'): https://stevenpinker.com/files/pinker/files/jcl_macwhinney_c...

The assumption that syntax is 'separate from semantics' also does not figure in any of Chomsky's POS arguments. Chomsky argued that syntax was separate from semantics only in the fairly uncontroversial sense that there are properly syntactic primitives (e.g. 'noun', 'chain', 'c-command') that do not reduce entirely to semantic or phonological notions. But even if that were untrue, it would not undermine POS arguments, which for the most part can be run without any specific assumptions about the syntax/semantics boundary. Indeed, semantic and conceptual knowledge provides an equally fertile source of POS problems.

Yeah, I don't necessarily buy the whole Chomskian program. I'm willing to be persuaded that the reason kids learn to speak despite their individual poverty of stimulus is that there was sufficient empirically experience stimulus over evolutionary time. The Chomskian grammar stuff seems way too Platonic to be a description of human neuroanatomy. But be that as it may, it's clear the stimulus it takes to train an LLM is orders of magnitude greater than the stimulus necessary to train an individual child, so children must have a different process for language acquisition.

Children do get ~6000 hours a year of stimulus. Spoken, unspoken, written, and body language. Even then they aren't able to form language proficiently until 5 or 6 years old. Does the internet contain 30,000 hours of stimulus?

30,000 hours is about the amount of new video uploaded to YouTube every hour.

That's astonishing. If you watched all of them, how much new information would you learn? I suspect a large portion of them are the same information presented differently; for example a news story duplicated by hundreds of different channels.

It's a huge amount of video-game footage included in those "hours of video uploaded per-hour".

So very, very little new info will be conveyed by the vast majority of the content.

Yeah, I imagine every moment of communication a child receives is new information not just baby talk about getting the spoon in their mouth and asking them if they have pooped.

> Even then they aren't able to form language proficiently until 5 or 6 years old.

Not even close; they're already proficient at two.

> Does the internet contain 30,000 hours of stimulus?

Is this a joke?

I'm sure someone else could calculate the informational density of all of the text on the internet vs. 30,000 hours of sight, smell, touch, sound, etc density. My intuition tells me it's not even close.

Does the information contained in smell and touch contribute to the acquisition of language? Keep in mind you'd be arguing that people born without a sense of smell take longer to develop language, or are otherwise deficient in it in some way. I'm doubtful. It's certainly tricky to measure full sight / sound vs. text, but luckily we don't have to, because we also have video online, which, surprise surprise, utterly dwarfs 30,000 hours of sight and sound in terms of total information.

One qualitative difference is that the child's 30,000 hours is realtime, interactive, and often bespoke to the individual and context. All the videos on youtube are static and impersonal.

I agree its not even close! A single day of YouTube uploads alone is 720,000 hours!

I think what he's saying is that "real world" interaction is so high bandwidth it dwarfs internet (screen based) stimulation. Not saying I agree just that he's not comparing hours being alive to hours of youtube

> And Chomskian linguistics have more or less collapsed with the huge success of statistical methods.

People have been saying this for decades. But the hype around large language models is finally starting to wane and I wouldn't be surprised if in another 10 years we hear again that we "finally disproved generative linguistics" (again?)

Also, how many R's are in "racecar"?

Counterpoint: What progress has generative linguistics made in the same amount of time that deep learning has been around? It sure doesn't seem to be working well.

Also, the racecar example is because of tokenization in LLMs - they don't actually see the raw letters of the text they read. It would be like me asking you to read this sentence in your head and then tell me which syllable would have the lowest pitch when spoken aloud. Maybe you could do it, but it would take effort because it doesn't align with the way you're interpreting the input.

>What progress has generative linguistics made in the same amount of time that deep learning has been around? It sure doesn't seem to be working well.

Working well for what? Generative linguistics has certainly made progress in the past couple of decades, but it's not trying to solve engineering problems. If you think that generative linguistics and deep learning models are somehow competitors, you've probably misunderstood the former.

Also being able to count number of letters of a word is not required for language capability in the Chomskian sense at least.

> using a finite set of symbols to represent an infinite number of communicable meanings

This always seemed wildly implausible to me. A very large number of communicable meanings, sure, but infinite?

> This always seemed wildly implausible to me. A very large number of communicable meanings, sure, but infinite?

This is "trivial" in the boring kind of way. With just digits, we can communicate an infinite set of distinct numbers simply by counting.

We can't really communicate an infinite amount of numbers. People just can't read or remember too many digits.

We can. Scientific notation with 1 significant figure can be meaningful because we can use it to figure out order relations. It’s an infinite language.

David Deutsch claims in “The Beginning of Infinity” this is a property called universality, and that we have it. A short excerpt:

https://www.lesswrong.com/posts/HDyePg6oySYQ9hY4i/david-deut...

The whole book is worth reading, though, as it lays it out in more detail.

Seems trivially demonstrable because you can just chain things forever?

Mary ran after the dog and the dog was brown and a cat came along and…

> you can just chain thing forever

I think you're going to find out that no, you can't, and this impossibility is going to trivially demonstrate itself.

Recite 99 bottles of beer on the wall, but start from 1 and change so the number increases? Stop when there are no remaining numbers or when you reach infinity, whichever comes first.

So, is this a proposal to test how long it takes for you to lose your count?

They are talking as if language was some platonic construct like a Turing machine with an infinite tape and you are talking about the concrete reality where there are no such things as an infinite tape.

Both viewpoints are useful, they can prove general properties that hold for arbitrary long sequence of words and you put a practical bound on that length.

The question is if human are capable of infinitely extensible language.

That's clearly false. It's not about some platonic mathematical simplification. Humans patently do not fit the Chomsky criterium for intelligence.

In fact, I'm pretty sure it's physically impossible for any real being to fit it.

[deleted]

Can you say more? English doesn't have any cap on sentence length I think i'm missing your point

> English doesn't have any cap on sentence length

Well, yes and no. Constructing this "infinite" sentence will run into some serious problems once the last star burns out, possibly sooner.

"I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain."

Since English has several possible sentences that are infinite in length, made up of only one word even https://medium.com/luminasticity/grammatical-infinities-what... I have to agree with all the this is trivial comments.

Whatever "finite set of symbols" humans use to communicate is not the finite set of symbols that form letters or words. Communication isn't discrete in practical sense, it's continuous - any symbol can take not just different meanings, but different shades and superposition of meanings, based on the differences in way it's articulated (tone, style of writing - including colors), context in which it shows, and context of the whole situation.

The only way you can represent this symbolically is in the trivial sense like you can represent everything, because you can use few symbols to build up natural numbers, and then you can use those numbers to approximate everything else. But I doubt it's what Chomsky had in mind.

> "maybe we shouldn't have this all-or-nothing view of language"

That idea seems like a strawperson, not something anyone seriously thinking about it would say. Everyone sees animals communicate.

It's not a question of communication, it's a question of language.

This is not really the point though. Why is "uses Chomskian language" the criteria for whether it not it's okay to to change and slaughter a living being?

There is and remains a desire to explain exactly how it is that humans are different than other animals. Language or the language faculty has been touted by some as this thing.

I do not know anyone did this. I don't think biologist care about this.

Those researchers are just making noises. It doesn't mean anything.