Does this mean that if your model is "overfitting", the solution is to train for even more epochs?
Maybe. Just means that the conventional wisdom was wrong and substantially over training can be a good thing. No one I knew at the time suspected that, including the people who wrote the paper.
Maybe. Just means that the conventional wisdom was wrong and substantially over training can be a good thing. No one I knew at the time suspected that, including the people who wrote the paper.