I don't buy the narrative that the article is promoting.

I think the machine learning community was largely over overfitophobia by 2019 and people were routinely using overparametrized models capable of interpolating their training data while still generalizing well.

The Belkin et al. paper wasn't heresy. The authors were making a technical point - that certain theories of generalization are incompatible with this interpolation phenomenon.

The lottery ticket hypothesis paper's demonstration of the ubiquity of "winning tickets" - sparse parameter configurations that generalize - is striking, but these "winning tickets" aren't the solutions found by stochastic gradient descent (SGD) algorithms in practice. In the interpolating regime, the minima found by SGD are simple in a different sense perhaps more closely related to generalization. In the case of logistic regression, they are maximum margin classifiers; see https://arxiv.org/pdf/1710.10345.

The article points out some cool papers, but the narrative of plucky researchers bucking orthodoxy in 2019 doesn't track for me.

Yeah this article gets a whole bunch of history wrong.

Back in 2000s, the reason why nobody was pursuing neural nets was simply due to compute power, and the fact that you couldn't iterate fast enough to make smaller neural networks work.

People were doing genetic algorithms and PSO for quite some time. Everyone knew that multi dimentionality was the solution to overfitting - the more directions you can use to climb out of valleys the better the system performed.