I think it's a solid theoretical contribution, but it might nonetheless fail to have practical relevance if some of their assumptions and approximations turn out to be too unrealistic. One way this could happen, for example, would be if typical training batches get gradients with a high-enough signal-to-noise ratio that their optimizer tweak ends up not tweaking much. Their somewhat unusual selection of experiments makes me suspect that this might be the case.
I read the paper earlier when it showed up on https://news.ycombinator.com/from?site=arxiv.org and the writing style of the blog post turned me off so I didn't bother to check how much it overhypes the results compared to the paper, but certainly a lot of people seem to have gotten the idea that this must be big if true, whereas I think it's better classified as neat, but not revolutionary.
Thanks. I found the DPO test very interesting where they intentionally twiddled the dataset to create 'noise'. That would I think back your concern. But on the other hand a method that allows noisier data in or does better with data sets that are low signal across bands would be nice to have.