I’ve always said that the more divergent the input is from the resulting output, then the less personal expression you have. For me, in order of moving away from meaningful control in generative models, it goes: “text → code,” “text → picture,” and, at the very bottom, “text → music.”

For me personally, music composition begins and ends with the motif - the melody itself. It’s the part I enjoy the most, and it’s also the part I have the most individual control over since I can sing.

Everybody makes music differently, but if you lack the ability to play an instrument and you also can’t whistle or sing, it’s hard for me to imagine how you’d have any meaningful control over the melody.

How would a non‑musician express an actual melody that they came up with (beyond simple things like instrumentation and general “feelings”) in text? RED RED RED BLUE. (Sorry couldn't resist a Mission Hill reference here.)

With all that out of the way, there's still lots of room for using AI in music. I’ve used it to take some of my existing songs, mostly pianistic in nature, and swap out instrumentation and arrangements just to play around with different soundscapes. It's like BIAB on steroids.