Project Overview

Music recommenders systems (MRS) have exploded in popularity thanks to music streaming services like Spotify, Pandora, and Apple Music. More than half of current music consumption is by way of these services. While recommender systems have been around for some time and are well researched, music recommender systems differ from their more common siblings in some characteristically important ways: the duration of the items is less (3-5 minutes for a song vs 90 minutes for a movie or months/years for a book or shopping item), the items are consumed in sequence with multiple items consumed in a session, repeated recommendations have a different significance since listening to the same song as part of a different playlist may be ok, and consumption often occurs passively as background as music. Music Recommender Systems then require different approaches from traditional recommender systems.

One of the major problems in Music Recommender Systems is the station/playlist generation problem. At its heart, playlist generation is about finding a set of songs to recommend to best extend the experience of a listener in the midst of a playlist. By suggesting appropriate songs to add to a playlist, a Recommender System can increase user engagement by making playlist creation easier, as well as extending listening beyond the end of the existing playlist.

One of Spotify’s primary products is Playlists, collections of tracks that individual users (or Spotify) can build for every mood or event. Spotify users can make or follow as many playlists as they like. With over 50 million tacks available, the company attempts to direct the most relevant songs to users based on their preferences, and Playlists often comprise the most convenient and effective way to convey these recommended songs to users.

Spotify participates in the creation and curation of Playlists that are followed by millions of Spotify users. These Playlists are compiled in a complex manner, involving both human-led and computer-led processes. What stands is that algorithmically-curated discovery playlists, and their effectiveness, remain an important business interest for the company. The goal is to better understand how these algorithms can be evaluated and improved with machine learning techniques learned in the class.

Problem Statement and Motivation

One of the common problems with the playlist generation is often described as the "cold-start" problem. In order to make generate additional songs within a playlist with very few songs, if the song is something that just came out, or if the song is very unpopular, it would be difficult for some recommendation algorithms to accurately provide recommendations that are meaningful and useful for the users. The motivation of this project is to address this "cold-start" problem.

Data Used

To tackle this problem, we used the 4 datasets listed above.

Million Playlist Datasets consists of numerous playlists and all of the songs included in the playlist.

Million Song Dataset gives tags added by users which characterizes each of the songs.

Genius Lyrics is an API that gives lyrics data for a given song.

Spotify API is an API that gives 13 audio features for a given song.

Literature Review / Past Works

As a reference, we were inspired by the following paper.

"Lyric Text Mining in Music Mood Classification" by Xiao Hu, J. Stephen Downie, Andreas F. Ehmann
"Neural Network Based Next-Song Recommendation" by Kai-Chun Hsu, Szu-Yu Chou, Yi-Hsuan Yang, Tai-Shih Chi

We were inspired by the following website.

Basic NLP: Bag of Words, TF-IDF, Word2Vec, LSTM https://www.kaggle.com/reiinakano/basic-nlp-bag-of-words-tf-idf-word2vec-lstm
Lyrics Genius Github Repo https://github.com/johnwmillr/LyricsGenius

Page updated

Google Sites

Report abuse