The Hindustani classical vocal music genre consists commonly of the vocals as lead, with the tabla for percussion and the harmonium for melodic accompaniment. In this work-in-progress, we are looking to build systems that can separate mixture concert audios for the purpose of performing more accurate transcription of the individual instruments.
Dhrupad is a North-Indian classical form of music with a rich and well-understood musical structure. This work was on devising methods to automatically detect and visualise this structure, and predict section boundaries.
Also referred to as bol-padhant, the recitation of a tabla composition using vocalic syllables plays an important role in the oral tradition of pedagogy in North Indian classical music. This study dealt with trying to correlate the expressivity in the vocal recitation and corresponding playing on the tabla, of tabla compositions.
Motivated by the relationship between the filters in an orthogonal perfect reconstruction filter bank, we introduced similar constraints on the encoder and decoder in a convolutional auto-encoder and observed some interesting filter responses!
Pedagogy is an important avenue for music technology research to reach the public. With a goal of building a tool that can assess tabla-playing in learners, we sought to first answer the questions - How well can experienced players judge "stroke goodness" from only the audio (and no visual cues)? What attributes do they rely on? Do they change based on the type of stroke?
A pre-trained auto-encoder-based source separation system was re-written in C onto a floating-point TI DSP board. Despite making some architecture changes, it took a whopping 15 minutes for a single inference pass. But, it was good to see it working!
With two drums, both of which can produce a variety of sounds, and are often struck simultaneously, the tabla makes a good case for using NMF decomposition to separate the two drums, as well as transcribe individual strokes.
A tool to determine which popular singer you sound like by finding the closest match to an encoded embedding of your voice from those of a few singers'.
An interactive 3-d environment in Unity to explore and sample from the latent space learned by a deep learning model trained on melodies.