It's the inaccuracy of things like shadows, sub-surface scattering and specular highlights. I think the shadow inaccuracy is what the human visual system is most sensitive to.
These LLMs might make content that looks initially impressive but they are absolutely not performing physically based rendering or have any awareness of the lighting arrangement in these scenes. There are a lot of things they get right, but you only have to screw up one small element to throw the whole thing off.
I am willing to bet that Unreal Engine 5 will continue to produce more realistic human faces than OAI ever can with these types of models. You cannot beat the effects of actually running raytracing in a PBR pipeline.