Neat idea.

A friend of mine made a similar animated GIF type captcha a few years ago but based on multiple scrolling horizontal bars that would each reveal their portion of the underlying image including letters, and made a (friendly) bet that it should be pretty hard to solve.

Grabbing the entire set of frames and greyscaling them, doing an average over all of them and then applying a few minor fixups like thresholding and contrast adjustment worked easily enough as the letters were reveleaed in more frames than not (I don't think that would affect the difficulty much though if it were any diffierent). After that the rest of the image was pretty amenable to character recognition.

That's reminiscent of a (possibly apocryphal?) method I once read about to get "clean" images of normally crowded public places - take multiple photos over time, then median each pixel. Never had the opportunity to try it myself, but I thought it sounded plausible as a way to get rid of transient "noise" from an otherwise static image.

That's a real method:

https://digital-photography-school.com/taking-photos-in-busy...

https://petapixel.com/2019/09/18/how-to-shoot-people-free-ph...

But it only works well if the crowds move out of the way reasonably quickly. If we're taking about areas packed with people all blocking a certain area, and you need hours of shots, the change in ambient lighting over time will have negative effects on the end photo.

Ah, that's the method indeed! Thanks!

There's an analogue method too - just do a very long exposure of a busy street. There is so little reflected light from people (relatively speaking) that it barely registers on the film/sensor, hence they are 'invisible' in the final shot. That's why old street photos are sometimes empty of people, because both lenses and film were slow, leading to long exposures.