Not to sound condescending, but this reads like someone fimiliar with LLMs but very unfamiliar with statistics in general.

If we could understand economics, or poverty, or any number of other social structures, simply by cramming data into a statistical model with billions of parameters, we would've done that decades ago and these problems would already be understood.

In the real world, though, there is a phenomenon called overfitting. In other words you can perfectly model the training data but be unable to make useful predictions about new data (i.e. the future).

Deep neural networks can generalize well even when they're far into the overparametrized regime where classical statistical learning theory predicts overfitting. This is usually called "double descent" and there are many papers on it.

I could be wrong but I think we crossed the 1 billion parameter threshold in 2019. I'm not sure we had this ability for decades.

They mean with traditional hard computation, not LLM magic.

> The emerging field of mechanistic interpretability suggests otherwise. Researchers are developing tools to understand how neural networks do what they do, from network ablation and selective activation to feature visualization and circuit tracing. These techniques let you study a trained model the way a biologist studies an organism, through careful experimentation and observation.

honestly, when I read that part of the article I imagined that author never studied how computers were made and where the engineering ideas came from, all technology just "popped" and here we are talking about complexity and stuff like the LLM is truly alive

The author is not wrong. You seem unaware of how nascent the field of LLM interpretability research is.

See this thread and article from earlier today showing what we're still able to learn from these interpretability experiments.

https://news.ycombinator.com/item?id=47322887

Really good data only goes back a couple or more decades, so any data you put in your model has only been influenced by the kinds of things we’ve seen in that time. Impact of a hot war between major powers? The gold standard? Stagflation? Invention of the car or train? Transition of major world powers to democracy or communism? All these events left almost no data compared to today, to say nothing of run-of-the-mill changes in styles of monetary policy, economic drivers or shifts in style of government.

I think this is a really important distinction to make. The OP seems to be making a fallacious equivocation on the word "parameter" - specifically, any individual "parameter" in a large ML model has no unit of measurement because it doesn't mean anything on its own. I watched a great documentary about the "Soft Hair on Black Holes" paper where they talk about having to move from the blackboard to the computer because the equation explodes into thousand of parameters - the key thing to understand being that each of those parameters represents some "real" thing, a momentum, a charge, a curvature, etc.