Automatic Music Transcription (AMT) is the task of converting a musical audio signal to a form of notation that describes the music, e.g. pitch contour of the melody, the corresponding notes, etc.
Source Separation (SS) is the task of separating a mixture audio containing several instruments playing simultaneously, into separate audio signals corresponding to its constituent sources, e.g., obtaining the vocals, drums and guitar from the recording of a pop song.
How are these two related?
SS can act as a valuable pre-processing step facilitating the transcription of the sources in the mix.
Hindustani classical vocal music consists of vocals as the lead, with the tabla for percussion and the harmonium or saarangi for melodic accompaniment, and a tanpura as a harmonic drone in the background. In this work-in-progress, we are looking to build systems that can separate mixture concert audios consisting only of vocals, tabla and tanpura, for the purpose of performing more accurate transcription of the vocals (F0 tracking) and tabla (stroke onset time and type).