This is very cool, though I don't understand exactly what they've done here. Is it some kind of LLM with convolutional layers added?
The graph doesn't exactly make it clear but it describes a pipeline that goes beyond the LLM, so the CNN could be a separate model there.
Here’s the academic paper behind it: https://arxiv.org/abs/2602.04101