From the web demo this model is really good at numbers. It rushes through them, slurs them a bit together, but they are all correct, even 7 digit numbers (didn't test further).
Looks like they are sidestepping these kinds of issues by generating the phonemes with the preprocessing stage of traditional speech synthesizers, and using the LLM only to turn those phonemes into natural-ish sounding speech. That limits how natural the model can become, but it should be able to correctly pronounce anything the preprocessing can pronounce