> this is 2025, how well do LLMs generate code for obscure languages where the training data is more sparse?

You'd be surprised: https://github.com/Tencent-Hunyuan/AutoCodeBenchmark/blob/b1...

Wow.. Elxir is the highest? That's interesting