Strongly agree with this comment. Decoder-only LLMs (the ones we use) are literally Markov Chains, the only (and major) difference is a radically more sophisticated state representation. Maybe "stochastic parrot" is overly dismissive sounding, but it's not a fundamentally wrong understanding of LLMs.
The RL claims are also odd because, for starters, RLHF is not "reinforcement learning" based on any classical definition of RL (which almost always involve an online component). And further, you can chat with anyone who has kept up with the RL field, and quickly realize that this is also a technology that still hasn't quite delivered on the promises it's been making (despite being an incredibly interesting area of research). There's no reason to speculate that RL techniques will work with "agents" where they have failed to achieve wide spread success in similar domains.
I continue to be confused why smart, very technical people can't just talk about LLMs honestly. I personally think we'd have much more progress if we could have conversations like "Wow! The performance of a Markov Chain with proper state representation is incredible, let's understand this better..." rather than "AI is reasoning intelligently!"
I get why non-technical people get caught up in AI hype discussions, but for technical people that understand LLMs it seems counter productive. Even more surprising to me is that this hype has completely destroyed any serious discussions of the technology and how to use it. There's so much oppurtunity lost around practical uses of incorporating LLMs into software while people wait for agents to create mountains of slop.
>Decoder-only LLMs (the ones we use) are literally Markov Chains
Real-world computers (the ones we use) are literally finite state machines
Only if the computer you use does not have memory. Definitionally if you are writing and reading from memory, you are not using an FSM.
No, it can still be modeled as a finite state machine. Each state just encodes the configuration of your memory. I.e. if you have 8 bits of memory, your state space just encodes 2^8 states for each memory configuration.
Any real-world deterministic thing can be encoded as a FSM if you make your state space big enough, since it by definition there has only a finite number of states.
You could model a specific instance of using your computer this way, but you could not capture the fact that you can execute arbitrary programs with your PC represented as an FSM.
Your computer is strictly more computationally powerful than an FSM or PDA, even though you could represent particular states of your computer this way.
The fact that you can model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG.
> you can execute arbitrary programs with your PC represented as an FSM
You cannot execute arbitrary programs with your PC, your PC is limited in how much memory and storage it has access to.
>Your computer is strictly more computationally powerful
The abstract computer is, but _your_ computer is not.
>model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG
Yes this I agree. But going back to your argument, claiming that LLMs with a fixed context-window are basically markov chains so they can't do anything useful is reductio ad absurdum in the exact same way as claiming that real-world computers are finite state machines.
A more useful argument on the upper-bound of computational power would be along the lines of circuit complexity I think. But even this does not really matter. An LLM does not need to be turing complete even conceptually. When paired with tool-use, it suffices that the LLM can merely generate programs that are then fed into an interpreter. (And the grammar of turing-complete programming languages can be made simple enough, you can encode Brainfuck in a CFG). So even if an LLM could only ever produce programs with a CFG grammar, the combination of LLM + brainfuck executor would give turing completeness.
Edit: There was this recent HN article along those lines. https://news.ycombinator.com/item?id=46267862.
> so they can't do anything useful
I never claimed that. They demonstrate just how powerful Markov chains can be with sophisticated state representations. Obviously LLMs are useful, I have never claimed otherwise.
Additionally, it doesn’t require any logical leaps to understand decoder only LLMs as Markov Chains, they preserve the Markov Property and otherwise be have exactly like them. It’s worth noting that encoder-decoder LLMs do not preserve the Markov property and can not be considered Markov chains.
Edit: I saw that post and at the time was disappointed by how confused the author was about those topics and how they apply to the subject.
> why smart, very technical people can't just talk about LLMs honestly
Because those smart people are usually low-rung employees while their bosses are often AI fanatics. Were they to express anti-AI views, they would be fired. Then this mentality slips into their thinking outside of work.