It does take videos (like mp4) as input but will only output the stripped audio track.

I might add the custom filler word functionality and/or perhaps just make the filler word list configurable.