> Sure you can't point it to a Jira ticket and get a PR
You absolutely can. This is exactly what SWE-Bench[0] measures, and I've been amazed at how quickly AIs have been climbing those ladders. I personally have been using Warp [1] a lot recently and in quite a lot of low-medium difficulty cases it can one-shot a decent PR. For most of my work I still find that I need to pair with it to get sufficiently good results (and that's why I still prefer it to something cloud-based like Codex [2], but otherwise it's quite good too), and I expect the situation to flip over the coming couple of years.
How does Warp compare to others you have tried?
I've not used it for long enough yet for this to be a strong opinion, but so far I'd say that it is indeed a bit better than Claude Code, as per the results on Terminal Bench[0]. And on a side note, I quite like the fact that I can type shell commands and chat commands interchangeably into the same input and it just knows whether to run it or respond to it (accidentally forgetting the leading exclamation mark has been a recurring mistake for me in Claude Code).
[0] https://www.tbench.ai/