To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)
Obviously they could just delete .git for their test if they wanted to. But consider telling the LLM not to use git commands the same as if you have keys in a .env file, and you tell the LLM not to read it, you might be concerned.
To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)
It's not a great sign for alignment.
Agreed, alignment is just a separate issue that a vuln fixing benchmark doesn't need to be testing.
Obviously they could just delete .git for their test if they wanted to. But consider telling the LLM not to use git commands the same as if you have keys in a .env file, and you tell the LLM not to read it, you might be concerned.
Every day I am more and more convinced that AI labs can't code.