It's hard to tell. Training it with lots of examples of ffmpeg would not be surprising, and training it on screenshots would also make a lot of sense. It's not inconceivable at all they'd train it on "figure out a video by creating contact sheets". The whole end to end I'd consider less likely, but it'd also be a very small leap once you have the elements.

I think a lot will fall out naturally from relative modest levels of reasoning plus in-depth knowledge of what common tools will do. E.g. I also have used Claude to debug my compiler, and it knows gdb so much better than me that even though I know it's pretty useless at holding context through reading an assembly listing (lack of structure, I suspect), it's surprisingly good at working things out by just being good at exploiting a powerful tool.