Yup, LLMs are not "artificial intelligence" - they just generate most probable token, until their authors hardcode functionality for specific community tests.

Yes, in theory that’s what an LLM is / how an LLM works, but I think we’re a little bit past the “expensive auto-complete” analogy given all the layers of wrappers we’ve built on top of LLMs to package them into the applications being interacted with here, no?

Fundamentally though there is missing but implied information here that the LLM can’t seem to surface, no matter how many times it’s asked to check itself. I wonder what other questions like this could be asked with similar results.