> the very first time I allowed it to look at my codebase, it hallucinated a missing brace (my code parsed fine), "helpfully" inserted it, and then proceeded to break everything.
This is not an inherent flaw of LLMs, rather it is a flaw of a particular implementation-if you use guided sampling, so during sampling you only consider tokens allowed by the programming language grammar at that position, it becomes impossible for the LLM to generate ungrammatical output
> When it does this, I feel the hallucinations can be off the charts -- inventing APIs, function names, entire libraries,
They can use guided sampling for this too - if you know the set of function names which exist in the codebase and its dependencies, you can reject tokens that correspond to non-existent function names during sampling
Another approach, instead of or as well as guided sampling, is to use an agent with function calling - so the LLM can try compiling the modified code itself, and then attempt to recover from any errors which occur.