I've given Aider and Mentat a go multiple times and for existing projects I've found those tools to easily make a mess of my code base (especially larger projects). Checkpoints aren't so useful if you have to keep rolling back and re-prompting, especially once it starts making massive (slow token output) changes. I'm always using `gpt-4` so I feel like there will need to be an upgrade to the model capabilities before it can be reliably useful. I have tried Bloop, Copilot, Cody, and Cursor (w/ a preference towards the latter two), but inevitably, I end up with a chat window open a fair amount - while I know things will get better, I also find that LLM code generation for me is currently most useful on very specific bounded tasks, and that the pain of giving `gpt-4` free-reign on my codebase is in practice, worse atm.

There is a bit of learning curve to figuring out the most effective ways to collaboratively code with GPT, either through aider or other UXs. My best piece of advice is taken from aider's tips list and applies broadly to coding with LLMs or solo:

Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.

https://github.com/paul-gauthier/aider#tips