I think you missed the point. Would you call an image analysis library to describe an image or reason over a sequence of images? Check out some of the plots in the paper to see what these models can do.

I would if the image analysis library was backed by a VLM. I have not fully read the paper, but couldn't figure 6 have been done by an LLM writing a script that calls libraries for time series feature extraction and writing a hypothesis test or whatever? They will do the heavy lifting and return a likelihood ratio or some statistic that is interpretable to an LLM.