how does the spotify playlist algorithm even work?
how does the spotify playlist algorithm even work?
Artificial Intelligence (AI) has become inescapable. Whether it’s helping with tough homework or assisting when navigating a website, AI has permeated all sectors of life. In February of 2023, AI entered the music-user interface in a way that had never been utilized before – an ultra-specific mobile DJ that keeps you listening using the Spotify algorithm. While other streaming services have incorporated music suggestions for over a decade now, none of them mastered the algorithm like Spotify. After their original rollout, Spotify kept expanding their algorithm; the most recent update lets users create a playlist purely based on text suggestions, making finding new music and creating playlists easier than ever.
A Spotify promotional image for their algorithm-generated playlists. (Spotify)
The product created a positive feedback loop for business. Consumers could easily discover new music, keeping them streaming on Spotify while showing the listener new music they wouldn’t find without a deep dive. The music discovery potential even became a selling point in the battle for streaming superiority, a testament to the value of the technology.
With any highly-valuable technology, secrets are kept to prevent rivals from copying, and with Spotify AI, it's no different. While researching Spotify’s AI, the best explanation I could find informed me that Spotify AI used your listening data to create playlists — duh. I wanted to dig deeper. I was especially curious to see how the Spotify AI creates playlists based on the activities and moods they’re associated with. While for a human, it may seem obvious “The Sound of Silence” by Simon and Garfunkel isn’t a great workout song, how does a computer, with no working emotion decide this?
Due to the value of Spotify’s AI model, this information is not public. But when looking at AI, we can use statistical models to map how the AI might group these songs, allowing tendencies of the AI model to be unveiled.
The Setup
To begin this project, we need data. For my project, I was curious about how the AI differentiates between the auto-generated “activity.” I wanted two extremes and what seemed to be a middle ground, so I chose to look at the Spotify-generated “Hype Workout Mix”, “Homework Mix” and “Sleepy Weepy Mix.” In order to account for personal taste, I took the playlists of four Michigan Daily Music Beat Writers who listened to a diverse set of genres — their top genres were listed as Hip-Hop/Rap, Electro-Pop, Indie Rock and Art Pop — to cover a wide variety of generated mixes. From each writer, I analyzed four 50-song playlists, meaning each person gave me 200 songs to look at.
After a bit more digging, I uncovered the Spotify Application Programming Interface (API) to reveal the standards the algorithm uses to create the limitless playlists. Using this website, I was also able to get these values from the songs in each playlist.
The values Spotify uses are:
Beats per Minute (BPM)
Energy: The higher the number, the more energy. (0-100)
Danceability: The higher the number, the more “danceable” (0-1)
Loudness: The higher the value, the louder. (-40 - 10),
Valence: How “happy” a song sounds. (0-100)
Acoustics: The higher the value, the more acoustic the song is. (0-100)
The AI analyzes the raw audio file to assign these values, but I still don’t quite understand how. There are other metrics such as popularity, artist separation and a number used for shuffling the song, but those aren’t useful when looking at the trends in the created playlist. Now that we have our data and our metrics, we can look at patterns within the groups.
The Results
My first line of thought when starting the project was that the songs on the Workout playlist would be, on average, significantly higher in nearly every category tested than the “Sleep” song, with the homework falling somewhere between the two — easy enough. I’d graph the averages of the categories and everything would appear exactly how I predicted, case closed.
Unfortunately, like many predictions when modeling in the scientist, my expectations were completely wrong. Aside from acoustics, all of the categories had, on average, zero significant differences — meaning there is not one metric that separates a song from being placed on Sleep, Homework or Workout. My expectations were shattered. It was back to the drawing board.
The next day in class, I learned about a method that statisticians call Principal Component Analysis (PCA). “A PCA is not a statistical test - it’s simply a visualization of the variation in your data.” says Dr. Alison Davis-Rabowski, researcher and teacher in the Ecology and Evolutionary Biology Department. This was perfect: My first guess was shattered, but maybe a PCA help visualize how the AI might work.
While the averages may not have indicated a difference, the only condition this difference would show up in averages would be if the playlists followed a normal distribution, something I didn’t account for when running the original analysis. With a limited data set and a high-probability of the data set to be skewed due to grouping, it was highly unlikely the data would be normal. Instead, the data’s variance, or spread, would be a better indicator of potential groupings. By looking at a histogram of the playlists compared to one another, this could key into factors considered by the AI algorithm.
PCA Visual
While the axis labels of PC1 and PC2 (‘PC’ standing for Principal Component) may look intimidating, they describe a fairly straightforward concept. Essentially each axis combines the data for my six tested variables (categories) to calculate the two highest levels of variance among our data. In this graph, the data that is more similar is grouped close together (low variance), and data with differences are more spread (high variance). While PC2 doesn’t reveal much, the PC1 axis indicates there is an existing grouping as all the workout songs are on one side of our x axis while the rest are more scattered. When running the code further, the variance explained by PC1 stems from energy, loudness and acoustics separating the variables among groups.
PCA of our playlists generated by me
While the averages may not have indicated a difference, the only condition this difference would show up in would be if the playlists followed a normal distribution, something I didn’t account for when running the original analysis. With a limited data set and a high-probability of the data set to be skewed due to grouping, it was highly unlikely the data would be normal. Instead, the data’s variance, or spread, would be a better indicator of potential groupings. By looking at a histogram of the playlists compared to one another, this could key into factors considered by the AI algorithm.
In looking at the variances among groups, a few trends among the playlist can be discerned. To start with the most obvious: Workout songs DO NOT have high acoustics. It makes sense logically, but I find it interesting that this seems to have a high priority among our data set. Meanwhile, sleep songs are typically categorized as ‘sad’ — a product of sad songs oftentimes being slower, more acoustic and lower-energy. Still, the idea of the algorithm wanting you to end your day listening to sad music is hard to grasp for me.
Another point of interest is the data that isn’t so clear to read. For example, the spread of BPM per playlist seems to be very scattered, despite studies showing that heart rate can be influenced by tempo. This would indicate BPM is not a significant driving factor in how the AI picks music for moods, something that threw me for a loop when thinking about this project. The shape of the Energy and Dance histograms reinforces my obvious hypothesis that energy and dance will oftentimes be higher in Workout, lower in Sleep and somewhere in the middle in Homework - a clear, simple visual that is often rare in data analytics.
I think the most interesting result from the analysis comes from the data I selected. I intentionally picked my participants based on two metrics: 1) They listen to a lot of music 2) They listen to a variety of genres. Although more data would be better here, it was interesting that regardless of genre preferences and user data, the songs suggested to reflect certain moods are similar, indicating the Spotify algorithm may not be as catered to individual preferences as it may seem.
For humans, it’s easy to discern the perfect song for the moment. But for AI, conveying the complexities of emotion through playlist creation seems like an impossible task - yet, for the most part, Spotify AI does a good job.While many of the ins and outs of Spotify’s AI tools will forever remain hidden from the public eye, data analysis can help uncover major facets of the developing technology to look into how AI is depicting the human experience.
** All values for API metrics sourced from http://sortyourmusic.playlistmachinery.com/
Spotify. (2024, April 7). Spotify Premium users can now turn any idea into a personalized playlist with AI Playlist in Beta [Screenshot by author]. Spotify Newsroom. https://newsroom.spotify.com/2024-04-07/spotify-premium-users-can-now-turn-any-idea-into-a-personalized-playlist-with-ai-playlist-in-beta/
DJ Mag. (2023, May 17). Spotify’s AI DJ has launched in the UK and Ireland [Screenshot by author]. https://djmag.com/tech/spotifys-ai-dj-has-launched-uk-and-ireland