Also it’s the wrong tool for this kind of work.

Claude, opencode etc. Are brute force coding harnesses that literally use bash tools plus a whole bunch of vague prompting (skills, AGENT.md, MCP and all that stuff) to nudge them probabilistically into desirable behavior.

Without engineering specialized harnesses that control workflows and validate output, this issue won‘t go away.

We‘re in the wild west phase of LLM usage now, where problems emerge that shouldn’t exist in the first place and are being solved at the entirely wrong layer (outside of the harness) or with the entirely wrong tools (prompts).