The current music generators use next token prediction, like LLMs, not image denoising.
[0] https://arxiv.org/abs/2503.08638 (grep for "audio token")
The current music generators use next token prediction, like LLMs, not image denoising.
[0] https://arxiv.org/abs/2503.08638 (grep for "audio token")