The fast interconnect between nodes has aaplications in inference at scale (big KV caches and other semi-durable state, multi-node tensor parallelism on mega models).
But this article in particular is emphasizing extreme performance ambitions for columnar data processing with hardware acceleration. Relevant to many ML training scenarios, but also other kinds of massive MapReduce-style (or at least scale) workloads. There are lots of applications of "magic massive petabyte plus DataFrame" (which is not I think solved in the general case).