> "Tbh they could've just hooked up zig translate-c to c2rust".
Have you ever seen what comes out of c2rust? It's awful. It relies on a library of functions which emulate unsafe C pointer semantics with unsafe Rust.
A few years ago, when I was struggling with bugs in OpenJPEG (a JPEG 2000 decoder), someone tried running it through c2rust. The converted unsafe rust segfaulted at the same place the C code did. It's compatible, but not safe.
Main insight: don't do string manipulation in C or unsafe Rust. It's totally the wrong tool for the job.
> Have you ever seen what comes out of c2rust? It's awful. It relies on a library of functions which emulate unsafe C pointer semantics with unsafe Rust.
which is somewhat close to what their port produced...
like their goal was from the get to go to have a mostly exactly the same as zig "just in rust" which implies mostly unsafe rust and all the soundness/memory issues zig has (plus probably some more due to AI based port instead of a tool like c2ruts)
the thing is if you don't keep things mostly 1:1 with all the problems that has there is absolutely no way to review that PR or catch the AI going rogue with hallucinations etc. With a mostly 1:1 port you can at least check if things seem mostly the same.
but it also means this is just step 1 of very many, with the other being incrementally fixing soundness, removing unsafe and (hopefully) making the code more idiomatic...
(to got to the actual question of why?, I think the answer is doing this port using AI is likely way easier/faster then first writing a tool which need in depth understanding of the languages, especially given that some features in zig do not map 1:1 in rust and fuzzily mapping is what LLMs are good at and human hand written tools tend to be very bad at).
> The converted unsafe rust segfaulted at the same place the C code did. It's compatible, but not safe
That is indeed the point of c2rust. It gives you a baseline that is semantically identical to the original codebase, and with that passing the full test suite, bug-for-bug, you can then start gradually adopting rusty idioms to improve the memory safety of the codebase.
What comes out of c2rust is not intended for human consumption. It's more verbose than the original and harder to work on, but no safer. You lose the C idioms that people understand, while not gaining Rust idioms. It's like working on compiler-generated assembly code by hand.
2022 discussion on HN.[1]
There's a DARPA funded effort called TRACTOR, Translate All C To Rust, which has funded some efforts to develop a usable translator.[2] It's about 10 months after award, with no reported progress. I've been checking the personal sites of the academics involved, and they barely mention the project, although $5 million has been allocated to it.[3] The approach comes from U.C. Berkeley - let the LLM generate slop, check it using formal methods.[4] Not expecting near-term results.
[1] https://news.ycombinator.com/item?id=30169263
[2] https://csl.illinois.edu/news-and-media/translating-legacy-c...
[3] https://chandrasekaran-group.github.io/
[4] https://metalift.pages.dev/
> let the LLM generate slop, check it using formal methods
I'm much more bullish on the opposite approach. Perform the naive translation, let the LLM loose on cleaning it up...
The module with the code mentioned is at [1]
This is awful. They have some internal string format borrowed from a Zig library where the address of the item is in the low end of a pointer and the length is at the high end. Why are they doing that in 2026? It lets you save a few bytes at best. It doesn't enforce the Rust rule that strings must be strict UTF-8. It's totally alien to the safe way Rust handles strings.
[1] https://github.com/oven-sh/bun/blob/main/src/bun_core/string...
> It doesn't enforce the Rust rule that strings must be strict UTF-8
Judging by the name, nor should it, because OS-paths aren't always UTF-8. See for example the rust standard library type OsString https://doc.rust-lang.org/std/ffi/struct.OsString.html
The rust std library string is a reasonable default, but it's not always the right choice. Lots of projects use different things for good reasons.
For the same reason the V8 team bothered to set up a 32-bit addressing scheme for the GC heap even on 64-bit platforms, I imagine? The bytes add up when there’s enough of them.
What they’ve done here isn’t safe either, and doesn’t have the consistent translation of rust2c.
Sure, but the point remains. They could've used Claude to build a Zig to Rust converter, ended up with something that was both deterministic _and_ beneficial to the wider community.