Why AI in music is so late ?

AIs are able to generate text, images, and now videos. What about music?

Why there's no such AI that you can prompt "generate a music, country style, from the 80s, with a female voice singing, dynamic, background violon notes, 3 verses and a chorus" and then "good, now add some bass, try an other melody, and lower the singer's pitch" etc.

I feel it's complicated to do, but at the same time much easier than generate videos (and even images), especially the dataset on which such a model can be trained is very clean, they basically have the entire spotify/apple music library, with the genre/year/number of listens and all infos on each track.

Why there's no such thing yet? Why it's so late compared to text/image/video generation?

submitted by /u/Choupika8
[link] [comments]