This is very cool, though I don't understand exactly what they've done here. Is it some kind of LLM with convolutional layers added?

The graph doesn't exactly make it clear but it describes a pipeline that goes beyond the LLM, so the CNN could be a separate model there.

Here’s the academic paper behind it: https://arxiv.org/abs/2602.04101