Who can tell me how creating a sound generate from text localy

You're looking for text-to-speech. Qwen actually has a model and library for this: Qwen3-TTS [1].

[1]: https://github.com/QwenLM/Qwen3-TTS