This strikes me as a very agent-friendly problem. Given a harness that enforces sufficiently-rigorous tests, I'm sure you could spin up an agent loop that methodically churns through these functions one by one, finishing in a few days.
This strikes me as a very agent-friendly problem. Given a harness that enforces sufficiently-rigorous tests, I'm sure you could spin up an agent loop that methodically churns through these functions one by one, finishing in a few days.
hallucinations in a libc implementation would be especially bad
Have you ever used an LLM with Zig? It will generate syntactically invalid code. Zig breaks so often and LLMs have such an eternally old knowledge cutoff that they only know old ass broken versions.
The same goes for TLA+ and all the other obscure things people think would be great to use with LLMs, and they would, if there was as much training data as there was for JavaScript and Python.
i find claude does quite well with zig. this project is like > 95% claude, and it's an incredibly complicated codebase [0] (which is why i am not doing it by hand):
https://github.com/ityonemo/clr
[0] generates a dynamically loaded library which does sketchy shit to access the binary representation of datastructures in the zig compiler, and then transpiles the IR to zig code which has to be rerun to do the analysis.
To be fair, this was true of early public LLMs with rust code too. As more public zig repositories (and blogs / docs / videos) come online, they will improve. I agree it's a mess currently.
You must have not tried this with an LLM agent in the past few months.
i tested sonnet 4.5 just last week on a zig codebase and it has to be instructed the std.ArrayList syntax every time.
I made a Zig agent skill yesterday if interested: https://github.com/rudedogg/zig-skills/
Claude getting the ArrayList API wrong every time was a major reason why
It’s AI generated but should help. I need to test and review it more (noticed it mentions async which isn’t in 0.15.x :| )
The linked blog post about making this is an excellent read.
Thanks! I think I spent as much time writing the post as I did making the skill, so I’m happy someone got some value out of it.
Fighting fire with fire
A little bit! I wrote a long blog post about how I made it, I think the strategy of having an LLM look at individual std modules one by one make it actually pretty accurate. Not perfect, but better than I expected
Try it again. This time do something different with CLAUDE.md. By the way it's happy to edit its own CLAUDE.md files (don't have an agent edit another agent's CLAUDE.md files though [0])
0: https://news.ycombinator.com/item?id=46723384
Are you using an agent? It can quickly notice the issue and fix it. Obviously if it's trained on an older version it won't know the new APIs.