Another idea: have the cubes point an edge straight forward (instead of a face). Then if each cube has two adjacent dark sides and two adjacent light sides, one could setup two ‘simultaneous’ images: one viewed from the left at 45° and another viewed from the right. (Each pixel would have four possibilities.)
If you're willing to sacrifice a color just use triangles/prisms the faces could then just be virtually adjacent and still rotate independently
https://excalidraw.com/#json=driyv7dR-eODBzuh_hdrk,93QQvkYae...
I guess the patents are long expired now and don't really apply to pixels, but that concept exists already for non-pixelated images and sadly these are replaced mostly by LEDs now in the wild:
https://www.rotapanel.com/trivision-mechanism-and-prism-type...
Similarly, the camera could stay face-on and double the pixel count with largely the same hardware.
For this to work, you'd want two adjacent faces painted, rather than opposite faces being painted, which seems to be how they're currently done (unless they only have one face painted?). Then the four possible rotations would allow for each possible pixel-pair. (The cubes could perhaps instead be squat rectangular prisms, to correct the aspect ratio, too.)
Likewise, if you generalize to 3-face array, you'd need an octagonal unit painted in a 2^3 debruijn sequence...
... But that's as far as you could take it, since 16-gons would show at least 7 faces while only having an encoding for 4.
I also thought of using hexagonal prisms, showing two faces at a time in paired colours but using three colours. These would also need much less clearance in order to rotate freely, compared to face-on cubes.
Or paint the 4 faces RGBK or CYMK or to get a colour display?