That’s a very real example of the core problem: LLMs don’t reliably honor constraints, even when they’re explicit and simple. Instruction drift shows up fast in learning tasks — and quietly in production systems.

That’s why trusting them “agentically” is risky. The safer model is to assume outputs are unreliable and validate after generation.

I’m working on this exact gap with Verdic Guard (verdic.dev) — treating LLM output as untrusted input and enforcing scope and correctness outside the model. Less about smarter prompts, more about predictable behavior.

Your Spanish example is basically the small-scale version of the same failure mode.