Is this a fundamental issue with any LLM, or is it an artifact of how a model is trained, tuned and then configured or constrained?

A model that I call through e.g. langchain with constraints, system prompts, embeddings and whatnot, will react very different from when I pose the same question through the AI-providers' public chat interface.

Or, putting the question differently: could OpenAI not train, constrain, configure and tune models and combine them into a UI that then acts different from what you describe for another use case?