An issue not brought up is that LLMs are not deterministic enough to follow rules -- it would be nice if we had a perfect robot that could do all these things and then determine rules for it to follow. But it only took prompt tampering with Grok for it to start talking about mechahitler, and I'm pretty sure at least that wasn't entirely planned. Inconsistency is almost to be expected from LLMs.