Invited Speakers

We are excited to announce our fantastic line-up of Keynote and Industry Speakers at NLP4MusA!

Keynote Speakers

Colin Raffel, Assistant Professor at University of North Carolina, Chapel Hill & Senior Research Scientist at Google Brain

What can MIR learn from transfer learning in NLP?

Transfer learning has become the de facto pipeline for natural language processing (NLP) tasks. The typical transfer learning recipe trains a model on a large corpus of unstructured, unlabeled text data using a self-supervised objective and then fine-tunes the model on a downstream task of interest. This recipe dramatically mitigates the need for labeled data and has led to incredible progress on many benchmarks that had previously been far out of reach. In this talk, I'll first give an overview of transfer learning for NLP from the lens of our recent empirical survey. Then, I will argue that transfer learning is massively underutilized in the field of music information retrieval (MIR), particularly in light of the scarcity of labeled music data. To prompt future research, I'll highlight some successful applications of transfer learning in MIR and discuss my own work on creating a large, weakly-labeled music dataset.

Colin Raffel is a Senior Research Scientist at Google Brain. His current work is focused on learning from limited labels, including semi-supervised, unsupervised, and transfer learning, exploring the limits of these techniques using transformers in Natural Language Processing. Colin was one of the pioneers in the use of deep learning for MIR, being first in introducing attention-based methods for audio processing. He completed his Ph.D. on Audio-To-MIDI alignment in LabROSA at Columbia University under Dan Ellis in 2016. He will be joining the Computer Science department at the University of North Carolina, Chapel Hill in Fall 2020.

Sam Mehr, Director of the Harvard Music Lab

Universality and diversity in human song

Understanding the basic design of the human psychology of music is an essential prerequisite for intelligent models of musical understanding, listener preferences, machine listening, and all things music in the digital age. In this talk I will present research using the Natural History of Song Discography to ask: What is universal about the psychology of music, and what varies? Using data from music information retrieval, amateur and expert listener ratings, and manual transcriptions, we find that acoustic features of songs predict their primary behavioral context across cultures. In behavioral experiments, we show that infants and young children are sensitive to the musical features of unfamiliar, foreign music. Last, in new work, we find that listener preferences are strikingly similar worldwide: while songs vary widely in how pleasant or unpleasant listeners find them, those ratings show little variability across listeners' countries of origin. I'm eager to discuss new industry-academia collaborations as we discover more about the psychology of music.

Sam Mehr is the director of the Harvard Music Lab. He studies music: how the design of the human mind leads us to perceive, create, and engage with music, and how this psychology of music may be leveraged to improve health outcomes in infancy and adulthood. These questions are multidisciplinary, drawing insights from the cognitive sciences, evolutionary biology, anthropology, ethnomusicology and music theory, linguistics, and computer science.

Industry Speakers

Tao Ye, Sr. Applied Science Manager at Amazon

Inside a real world conversational music recommender

When a user asks Alexa “Help me find music”, there are in fact a multitude of interesting problems to be solved, in the cross-section of Natural Language Understanding, recommendation systems, and advanced natural language generation. In Natural Language Understanding, we encounter intent identification, slots filling, and particular challenges of spoken language understanding (SLU) in the music domain. This is also different from a one-shot command SLU, where users tend to give a clear “play XYZ” intent. In a dialog, users increase variation in their speech and often answer a question casually such as “whatever, I don’t care”. We rely on both grammatically rules and statistical models to set intent, triggers and fill slots. Machine learning is also applies directly to construct an interactive recommender that makes recommendations more relevant. With real time user critique and feedback, we need to integrate long term user preferences and immediate user requests. Finally, how Alexa speaks to the users also makes a difference in the experience. We tackle the tough problem of making the entire conversation sound natural rather than robotic. Particularly, emotional and empathic tagged speech are used. The challenge is to know when to use these tags to vary speech.

Dr. Tao Ye is a Sr. Applied Science Manager leading the Conversational AI team at Amazon Music. Prior to joining Amazon, she spent 9+ years at Pandora SXM building recommendation system, and leading a personalization, search and voice science team. She has two decades of experience in the software industry, holds 14 granted patents and has published 12 peer reviewed papers. In the RecSys research community, she co-founded and co-chaired the Large Scale Recommender Systems workshop for 5 years (2013-2017), and gave numerous invited talks at conferences, RecSys workshops, and most recently a guest lecture on Building Large Scale Recommendation systems at 2019 Latin America Recommender School. She received her PhD from University of Melbourne in Electrical and Electronic Engineering, her MS from UC Berkeley in EECS and dual BS degrees from Stony Brook University in CS and Engineering Chemistry.

Fabien Gouyon, Head of Research, Europe, Pandora/SiriusXM

Lean-back or Lean-in?

In this talk I will go over some of Pandora’s latest research and product developments in the realm of voice interactions. I will address how NLU powers unique music listening experiences in the Pandora app, and highlight exciting opportunities for further research and development.

I lead Pandora/SiriusXM data science research team in Europe. We work on Machine Learning, Music Information Retrieval, Recommender Systems, and Natural Language Processing, and apply our research to personalized music and audio recommendations. I was president of the International Society for Music Information Retrieval until 2017.

Elena Epure, Marion Baranes & Romain Henequin, Research Scientists, Deezer

“Je ne parle pas anglais”, dealing with multilingualism in MIR

Deezer is a local player, on a global scale. Our goal is to serve a very diverse audience providing a seamless experience worldwide. Consequently, dealing with multilingualism, and more generally with multiculturalism is essential to us. In this talk, we address two topics for which the generalisation to multilingual data and users is particularly important: the user-app interaction through the search engine and the catalogue annotation with multilingual metadata. We conclude by contemplating the state of multilingualism in the music information retrieval (MIR) community.

Elena V. Epure, Marion Baranes and Romain Hennequin are Research Scientists at Deezer. Elena and Marion address natural language processing problems emerging from MIR. Romain is the scientific leader of the team which focuses on diverse topics such as music analysis, information retrieval, machine learning and recommendation.

Sravana Reddy, Sr. ML Researcher/Engineer at Spotify

The Spotify Podcasts Dataset

We present the Spotify Podcasts Dataset, a set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts, that we released for the TREC 2020 Challenge. We will talk about some of the characteristics of this dataset, and our experiments running baseline models for information retrieval and summarization.

Sravana Reddy works in the language technologies lab within Spotify Research. We work on externally facing research and also collaborate with product teams.

Rosa Stern, Sr. Computational Linguist, Sonos & Alice Coucke, Head of ML Research, Sonos

Music data processing for voice control

The focus of the Voice Experience team at Sonos is to bring together the profuse world of music and the slick user experience of a voice assistant within the Sonos home sound system. Supporting music related voice commands and a music catalog in our SLU (Spoken Language Understanding) system carries challenges at the various stages of our pipeline, which we’ll present and discuss in this talk. We’ll bring our focus on the main issues we encounter in our data processing pipeline, especially related to speech and voice recognition.

Rosa Stern and Alice Coucke work in the Voice Experience team at Sonos. Rosa is a computational linguist working with the language and SLU teams on data processing and topics crossing NLP and ML. Alice is Head of machine learning research, focusing on a variety of topics in SLU and acoustic modeling for voice assistants.

Isaac Julien, Research Engineer, Bose & Shuo Zhang, Sr. ML Engineer, Bose

Building a Personalized Voice Assistant for Music

The Bose Music Assistant was a former year-long research project that focused on building a personalized, conversational voice interface for music, with the goal of helping our customers find the content they enjoy. We will discuss the creation of a hybrid Grammar- and ML-based NLU engine that supported the Assistant and allowed us to quickly prototype and expand the experiences that it offered. We will also describe some of the NLP challenges we encountered in the music domain, and the opportunity that these challenges provided for personalization.

Isaac Julien is a Research Engineer at Bose, where he prototypes Machine Learning-enabled experiences for future wearable products.

Sponsors