Qualcomm India Pvt. Ltd. | May 2025 - July 2025
Application Software Center, IIT Bombay | May 2023 - July 2023
Guide: Prof. Vikram M. Gadre, Electrical Engg., IIT Bombay
[Ongoing Research]
Guide: Prof. Ganesh Ramakrishnan, Computer Science and Engg., IIT Bombay
Conditional Flow Matching (CFM) models for text-to-speech often generate mel-spectrograms with unstable frequency evolution—low-frequency components grow too quickly while high-frequency details lag behind. This project introduces a training-free, frequency-selective boosting technique that dynamically adjusts spectral sub-bands during ODE integration using the Discrete Wavelet Transform (DWT). By enhancing lagging frequencies and tempering overly dominant ones, the method stabilizes the generative process and improves audio quality. Integrated with F5-TTS and Voicebox, it achieves significantly better perceptual quality (Frechet Audio Distance) without sacrificing intelligibility (Word Error Rate), offering a new level of control over CFM-based TTS generation.
[Work under review]
Guide: Prof. Biswabandan Panda, Computer Science and Engg., IIT Bombay
Guide: Prof. Preeti Rao, Electrical Engg., IIT Bombay
This supervised RnD exposition project investigates the use of automatic music transcription (AMT) techniques for monophonic Western and Carnatic guitar melodies. It examines core AMT components—onset detection, pitch estimation, note tracking, and technique recognition—focusing on expressive articulations such as slides, bends, vibrato, hammer-ons, and pull-offs. A dedicated dataset of recorded guitar articulations, complemented by publicly available Carnatic performances, supports a systematic evaluation of established methods including spectral-flux onset detection, probabilistic pitch estimation, and the TENT framework. While current approaches perform reliably for Western guitar, they struggle with the rapid slides and microtonal nuances characteristic of Carnatic style. The project highlights key limitations in existing AMT systems and emphasizes the need for algorithms that more effectively capture stylistic subtleties across diverse musical traditions.
Interpretable, AI-powered Dream-Team builder chatbot supported by RAG on Llama 3.2
Optimal input subset sampling for low-compute crowdsourced frames for 3D scene generation