Not surprising. Use of LLM has only been helpful in initial exploration of unknown code bases or languages for me.
Using it beyond that is just more work. First parse the broken response, remove any useless junk, have it reprocess with updated query.
It’s a nice tool to have (just as search engines gave us easy access to multiple sources/forums), but its limitations are well known. Trying to use it 100% as intended is a massive waste of time and resources (energy use…)