> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,

Is it?

The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.

In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.

I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.