My research career started in 2003 and has mainly been in the area of audio signal processing. A detailed description of the problems I have been working on is given below.
Amazon Music May 2019-Present
I am a Senior Applied Scientist with Amazon Music(BLR and Sunnyvale). I work in the domains of audio fingerprinting, content recognition, content moderation, lyrics analytics etc. I lead a team of 3 scientists and many of the projects i led are in various products/features of Amazon Music. Have filed 2 patents in audio fingerprinting and effective storage and retrieval of fingerprints.
Amazon Lab 126,Sunnyvale,USA: Mar-2017-Apr 2019
Am working as a Research Scientist in the Echo hardware team. Involved in using signal processing and machine learning algorithms for innovating new products, research, development, testing and deployment of new features for the Echo family of devices.
I was the RS for the Audio pipeline for the recently announced Amazon Echo Auto - https://www.amazon.com/Introducing-Echo-Auto-first-your/dp/B0753K4CWG. I have 2 granted patents in the domain of audio beamforming and selection for Echo auto type devices.
Cisco Systems Ltd,San Jose,USA : Sep 2016-Mar-2017
I was an IT Engineer in the Enterprise Architecture team, using machine learning approaches for NLP and data analysis. Worked on 2 prototype applications in my time in Cisco.
Samsung R&D Institute,Bengaluru,India : May 2015 - Jun 2016
My work so far has mainly been to design and develop prototype algorithms and POCs for audio applications on mobile and wearable devices. My work so far has resulted in
3 POCs in the area of music browsing and retrieval and 1 algorithm for a soon to be released wearable device. Won the Employee of the Month award for Jan 2016.
Fraunhofer IIS,Erlangen,Germany : May,2012 - April,2015
I worked for nearly 18 months in the area of audio coding and specifically on the topic of bandwidth extension. My work resulted in 4 patents and a new bandwidth extension algorithm currently accepted in MPEG-H standard. The worked for another 18 months in the area of sonification, automotive audio and auditory displays resulting in a total of 5 publications. I also was a co-guide for 2 Masters' theses and mentor for an intern.
PhD : Aug,2006 - Apr,2012
My PhD was in the area of music source separation using an hierarchical approach. My work resulted in 10 publications in various national and international conferences.
1) A psychoacoustically motivated approach to onset/offset detection in polyphonic music.
Moore's loudness model has been exploited to detect onsets and offsets in complex sound mixtures. A disadvantage present in Klapuri's onset detection technique has been rectified and the algorithm performs decently on our database. The algorithm performed well on 2010 MIREX onset detection task.The results are given in the link - http://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/summary.html.
Detailed results can be viewed here:- http://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/resultsperclass.html
My results in the subclasses matches the state of the art in 4 of the 9 categories.
2) Transient segment detection, extraction and enhancement of audio
a) Using an extension of the above said loudness model, we have a simple one shot algorithm that identifies transient segments in music.
b) An improved version of the above transient detection algorithm with a new approach using an iterative analysis of the STFT of a signal has been developed. This algorithm outperforms the above algorithm and also extracts the transient segment as a by product.
3)Harmonic vs Percussion separation in polyphonic music
One of the important steps of music analysis is to separate the individual instruments that make up the auditory scene. Towards this effort, we have developed a spectrogram diffusion based algorithm that uses anisotropic diffusion to separate out percussive instruments from harmonic instruments and vocals.
4)Post processing on the Harmonic / Percussion outputs to improve separation
We have proposed two algorithms to improve separation for Indian music.
a) A subband based median filtering approach to reduce vocal and harmonic instrument sections in the percussive components of polyphonic signals.
b) A percussion directed processing to alternately reduce percussive and harmonic leakages in harmonic and percussive components of polyphonic signals.
5) Lead vocal extraction from music
Using a well known algorithm for pitch tracking, we have proposed the use of the HPSS as a pre-processing stage for vocals extraction.
Problems attempted during MSc(Engg) : Aug,2003 - May,2006
I worked primarily on the topic of music information retrieval(MIR) with a focus on Query by Example(QBE) and speech music discrimination. This work resulted in 4 publications.