We said the same thing about language to be honest- the nuances of words and concepts are too hard for a word generator to correctly put together and now we have LLMs. We said the same thing about video generation where the nuances of light and shadow and micro expressions would be hard to replicate and LLMs are doing a pretty good job with that. We’re just waiting for physical LLMs, it will happen at some point.

As another poster pointed out, nobody has been able to remotely duplicate the sensors in our fingers.

I don’t think it needs to be all that complicated for the vast majority of things. It might not be able to twirl a pen around its fingers but it should be able to hold one and write something.