This is pretty cool as-is, but I can't help but try to think up ways to increase the speed. (Not the point, I know.) I feel like it should be able to do a whole column pretty quickly with some optimizations. If the device that turns a block could do so without needing x-axis alignment to change, then you could do a whole column pretty quickly. Or perhaps it'd be better to do rows instead of columns, since the y-axis alignment shouldn't need to change with the current device. As for the block-turning device itself, I think some sort of thing that rotates would speed things up since you wouldn't need to reset, I think. I bet a manufacturing automation specialist could get this thing cruising...
BTW I love that you initially went with a very direct e-ink analog with the balls!
If you had a rotating mechanism which allows slip then you could have rotor shafts which rotate all the blocks in a column while braking mechanisms prevent all the blocks in a given row from moving. Or you could have both rotor rows and rotor columns if you implement a rough mechanical equivalent of the hysteresis systems of ferrite-core memory. Or (I think GistNoesis suggested something similar https://news.ycombinator.com/item?id=44794092 ) if you hide a neodymium permanent magnet just under one corner of each of the blocks you could use a pair of electromagnets behind that block to pull the block from either orientation into the other, almost a solid-state solution apart from the axle the block would rotate on and potentially one which could set the entire display at once.
Thanks for thinking through it! I've found that moving left-right is a little noisier and has a little vibration - up-down is smoother. However, it's not that noisy and it'd be fun to experiment with different algorithms for finding the next pixel.