This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm