Below are several short lightning talks to give you a sampling of projects that we have worked on.
Have you ever recognized the composer of a piece of music, even though you had never heard the piece before? In this project, we wanted to know if we could build a machine learning system to recognize the composer of a page of sheet music based on its compositional style. This talk by Daniel Yang describes how we were able to accomplish this. The paper was one of three nominated for best student paper at ISMIR 2021. A longer 10 minute oral presentation is also available here, if you're interested.
Our goal in this project was to build a system that could recognize a page of sheet music based on a cell phone picture. To do this, we scraped a website containing tons of sheet music images (IMSLP), created a searchable database, and designed a method for fast and accurate search. This talk by Kevin Ji describes our "Marketplace" fingerprinting system, in which we frame the retrieval problem as an economics scenario and propose a solution that corresponds to an efficient marketplace. This approach is able to search all of IMSLP in under a second with high retrieval accuracy!
When Covid was shutting everything down, we asked ourselves the question, "Is there a way we can collaborate musically even in quarantine?" To answer this question, three of us decided to learn a movement from a Mendelssohn piano trio over winter break, recorded ourselves performing our individual parts in isolation, and then designed a system to align, time-stretch, and mix the recordings in a way that produces a complete performance. This talk by Kevin Ji describes our musical adventure. As fun as the research project was, the best part was returning to campus and recording it together in person!
The goal of this project was to teach a machine how to follow along in sheet music. The main technical challenge in this problem is determining the alignment between pixels in the sheet music images and time instants in the audio. This talk by Mengyi Shan describes an automated approach to generating piano score following videos given a set of sheet music images and an audio recording.
The goal of this project was to explore different ways of generating music in the form of readable sheet music images. We explored five different approaches, all based on Transformer-based language models. This talk by Irmak Bukey shares the insights we gained in the process. She and co-author Marcos Acosta presented this work at ISMIR 2022.
Nowadays it is easy to generate deepfake audiovisual content or to tamper existing audiovisual recordings using editing software. In this project, we explore a way to verify that audio has not been maliciously tampered in a specific context: short viral videos taken from broadcast news recordings. Rather than trying to detect artifacts of tampering (internal consistency), we instead focus on positively verifying an audio clip against a trusted source such as a recording from a major news agency (external consistency). This talk by Arm Wonghirundacha describes our approach.
One of the most widely used tools with time series data is an algorithm called dynamic time warping (DTW). Because DTW is a dynamic programming algorithm, it is typically computed in a sequential manner on a single CPU. The goal of this project was to explore ways to speed up exact or approximate DTW by utilizing parallelization. Thaxter Shaw and Daniel Yang later expanded upon this work to develop a highly parallelized GPU implementation for exact DTW.
You can find a more exhaustive list of research talks (and some fun musical collaborations) here.