I find with Claude that when I call its BS I get better results. And it openly admits to lying to and gaslighting me as well as not seeing any way to stop itself from continuing to do so.
Fable seemed less apt to do so but I didn't get enough time with it before it was yanked away to know for sure. It may have had mixed results on the benchmarks but it was finding bugs opus never found.