Honestly the first one where I would have guessed "this is a pelican riding a bicycle" if presented with just the image and 0 other context. This and the voxel tower are fairly impressive - we're seeing some semblance of visual / spatial understanding with this model.