The AI assistant complaints track with what I see on my end. Any general model I throw Wolfram Language at does noticeably worse than it does on Python. That part isn't surprising. There just isn't much public Wolfram code to learn from next to the mountain of Python sitting on github. It keeps guessing function names that sound plausible but don't exist. Spent an afternoon last week fixing hallucinated options on an NDSolve call it gave me.
>It keeps guessing function names that sound plausible but don't exist.
That's surprising considering how good their documentation is. A tool using LLM should have no problem with that. WolframLanguage is almost ideal for an LLM actually.
LLM's learn by training on examples more than by training on documentation. Especially since examples are usually bigger in data size.