One of the skills needed to effectively use AI for code is to know that telling AI "don't commit secrets" is not a reliable strategy.
Design your secrets to include a common prefix, then use deterministic scanning tools like git hooks to prevent then from being checked in.
Or have a git hook that knows which environment variables have secrets in and checks for those.
That's such an incredibly basic concept, surely AIs have evolved to the point where you don't need to explicitly state those requirements anywhere?
They can still make mistakes.
For example, what if your code (that the LLM hasn't reviewed yet) has a dumb feature in where it dumps environment variables to log output, and the LLM runs "./server --log debug-issue-144.log" and commits that log file as part of a larger piece of work you ask it to perform.
If you don't want a bad thing to happen, adding a deterministic check that prevents the bad thing to happen is a better strategy than prompting models or hoping that they'll get "smarter" in the future.
Part of why these things feel "not fit for purpose" is that they don't include the things Simon has spent three years learning? (I know someone else who's doing multi-LLM development where he uses job-specialty descriptions for each "team member" that lets them spend context on different aspects of the problem; it's a fascinating exercise to watch, but it feels even more like "if this is how the tools should be used, why don't they just work that way"?)
Doesn't seem to work for humans all the time either.
Some of this negativity I think is due to unrealistic expectations of perfection.
Use the same guardrails you should be using already for human generated code and you should be fine.