This work proposes a novel method to automatically generate, from an input music, a music video made of segments of Youtube music videos which would fit this music. The system analyzes the input music to find its genre (pop, rock, ...) and finds in the database segmented MVs with the same genre. Then, a K-Means clustering is done to group video segments by color histogram, meaning segments of MVs having globally the same distribution of colors. A few clusters are randomly selected, then are assembled around music boundaries, which are moments where a significant change in the music happen (for instance, transitioning from verse to chorus). This way, when the music changes, the video color mood changes as well.
The Python implementation of this project can be found on this Github.
This project was made as part of a master thesis at Tsinghua University. A paper (eventually rejected, which can be found here) was proposed to ACM Multimedia conference and explains more thoroughly the method used.
During the research and defending phase of this thesis, a website had been put online for any user to play with this code and generate a video for their own music. You can find the website code here and the demonstration video below:
In order to evaluate our method, we asked volunteers to judge a total of 30 music videos and classify them into one of the three categories : generated, professional of amateur MV. They shoud as well explain in a text input field why they made this decision.
We used Amazon Mechanical Turk to assess each MV to exactly 10 different users. We grouped together classifications answers for each category of music video to evaluate the performance of our algorithm on the figure below:
Results show that our generated MVs are most often perceived like human-made videos. 45% of generated videos were mistaken for professional music videos, and 21.6% were mistaken for amateur-made music videos. As expected, the % of videos classified as PRO increase with the video quality (generated < amateur < professional) and inversely the % of videos classified as GEN decrease with the video quality.
In these 30 MVS , 15 were generated music videos, selected for their quality, 7 were professional music videos selected randomly on Youtube, and 8 were amateur music videos found on Youtube. The music videos used for this experiment are presented below.