UniComposer
A novel music generation pipeline that composes at the band level, utilizing a hierarchical multi-track music representation complemented by four cascaded diffusion models which progressively generate rhythm features, and unified features extracted from both symbolic and audio music by autoencoders.
Band-level Music Generation
Capable of allocating instruments based on musical features, their expressive potential and performance characteristics differences.
Unification of Symbolic and Audio
Architecture of joining the advantages of both format together, harnessing the richness of audio data and expressiveness of symbolic music.
Given input audio/symbolic input, musicology feature (e.g., time signature) and melody feature are extracted.
Four cascaded DMs to gradually generate features for monophonic (e.g., piano), polyphonic (e.g., flute) and percussion (e.g., drum).
Symbolic output are decoded, and can be converted to audio (optional).
Band-level music representation.
Full MIDI
Main Melody
Reduced Mono.
Reduced Poly.
Reduced Perc.
Detailed Mono.
Detailed Poly.
Detailed Perc.
From SINGLE-TRACK input to BAND-LEVEL output.
Input Melody
Input Melody
Input Melody
Output Band-level Music
Output Band-level Music
Output Band-level Music
Dealing with MP3, WAV and MIDI music in SINGLE framework.
Input: Violin Melody (AUDIO)
Input: Piano Melody (AUDIO)
Input: Human Cappella (AUDIO)
Output: Band-level Music
Output: Band-level Music
Output: Band Accompaniment
2. Ability of translating MP3, WAV into MIDI.
Input: Single-track AUDIO
Input: Multi-track AUDIO
Input: Special Instrument AUDIO
Converted MIDI
Converted MIDI
Converted MIDI