tl;dr:
- Client makes request to server (which opens a bidirectional network socket)
- Server uses that bidirectional socket, spawns a local patched ffmpeg with vfs-like characteristics
- ffmpeg (using client-server bidrection socket) does input/output operations, treating client filesystem as if it was local
Thus client doesn't need to open any ports, or expose its filesystem in a traditional mounting manner, and one server can handle filesystems & requests of any amount of clients.