The proposed MV generation framework

  • Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing

  • In the proposed music video (MV) generation system, an uniform video segmentation is first applied to segment a queried long user generated video (UGV) into several video segments.


  • For each video segment, a multi-task deep neural network (MDNN) is adopted to predict the pseudo acoustic (music) features from the visual (video) features, called pseudo song prediction.


  • A dynamic time warping (DTW) algorithm with a pseudo-song-based deep similarity matching (PDSM) metric is used to align the UGV and a music track based on the acoustic features.


  • The video editing module based on the target and concatenation costs then selects and concatenates the segments of the UGV through the DTW-aligned result to generate a music-compliant professional-like video for each candidate music track.


  • Finally, the cost ranking module will rank all the generated MVs and recommend the best MV to the user.