But it still has to be auditable by humans, so I imagine some sort of LLM tool library over an existing language makes sense. Might be wrong! But langchain tools and pydantic schemas for Input/Output feel like the right abstraction

I can see the argument though, anything moving in that direction already?

Not that I know of! Its an interesting idea though, as you say it should remain auditable.

Along that line me wonder if it were possible to design an LLVM output (i.e. can work with existing code) that is extra well optimized for interop with a specialized LLM, e.g. encoding more information more compactly or something.