Hacker News

Not a mathematician so I’m immediately out of my depth here (and butchering terminology), but it seems, intuitively, like the presence of a massive amount of local minima wouldn’t really be relevant for gradient descent. A given local minimum would need to have a “well” at least be as large as your step size to reasonably capture your descent.

E.g. you could land perfectly on a local minima but you won’t stay the unless your step size was minute or the minima was quite substantial.