Author here: I absolutely do not understand this mindset. It has almost 20K unit tests by now and hundreds of full end-to-end tests of complicated scripts to ensure it works and matches the output of Wolframscript. Why does it matter that I was using Claude to help me implement it?
https://github.com/ad-si/Woxi/tree/main/tests
Spot checking, I don't see any issues.
e.g. https://github.com/ad-si/Woxi/blob/main/tests/list_tests.rs
Are they 20k unit tests or sloppy tests? Would 100k unit tests make it better?
My issue is that Mathematica is essentially a term rewriting system. Reimplementing everything in Rust seems to go against the idea. The derivative computation is 400 lines of Rust and could be 20 lines of Mathematica code.
My theory is that writing as much as possible in Rust will improve performance and produce higher-quality code due to Rust's static typing. So far, it's been working well, but the final verdict is still out.
I have not looked at the implementation but isn't the idea to write a Lispy language in Rust (in other words, Mathematica the language) and then write the differentiation and other routines in that.
They had to patch the Rust compiler to natively support AutoDiff.
Contrast with Julia where it can be a regular Julia library,
I mean you can do autodiff with a regular library in Rust. Enzyme is just a very specific type of autodiff which transforms after some compilation has taken place.
I don't know that you can match something speedwise like a JIT or Expression Templates in rust though without using something like Enzyme.
No, they're implementing all functions, all matching etc. in Rust.
Will that be faster? Seems like it should be a lot faster.
I see.
AI has a tendency of "just make it pass!" (which to be fair you also sometimes see from junior human devs - maybe where it learnt from!). Remember that C compiler which didn't even do basic error checking because that wasn't checked by the test suite?
A very young project written by AI means you haven't reviewed the code and nobody has used it in anger. It might work perfectly, but my experience of AI so far says that it won't.
It could very well be 20K slop tests