About the project

Billboard Chart Data

Using the billboard.py library, I pulled in data for three types of songs that could be written:

One Hit Wonders (past three months of weekly hits in 2021)
Songs that Last (year end top 100 songs from 2006-2020)
Global Phenomena (past three months of the top 200 global chart)

This gave me the song title and artist name, which was then fed into the Spotify API to return a dataframe of audio features. (Full audio features explained in my Wiki.)

Spotify Audio Features

I used Spotify API's audio features to extrapolate three components of a song: Vibe, Composition, and Mood. Vibe involved looking at how much bass was in a song, if it required the ability to rap, how danceable the track was, how energetic the song is, and how fast the tempo was. Anything that wasn't already on a scale of 0-1 was scaled using scikitlearn's MinMaxScaler function so it could all fit to a radar chart. Composition looked at the more technical parts of the song: what key is it in, is it major or minor chords, what's the time signature, how long is the song, and the occurence of mostly acoustic or instrumental tracks. Mood plotted energy versus valence (the "happiness" of a song's audio) which allowed us to know if angry, happy, sad, or peaceful music was more popular in hit songs.

Genius Lyrics NLP

I fed the song title and artist name into lyricsgenius' searchsongs function which returned the lyrics for each song found. These lyrics were then fed into gensim's LDA model, which returned a series of topics along with the words that contributed most to those topics. Each dataset was run so that the coherence score was optimized, resulting in 2 topics for One Hit Wonders and Songs that Last, and 3 topics for Global Phenomenon. I created a word cloud of the top 100 common phrases found by gensim's Phrase model. Song repetitiveness was calculated by taking the number of unique words by the number of total word count. Due to the profanity of the lyrics, there has been some censoring.

A few words of thanks...

This project was the final capstone for Nashville Software School's Data Science Cohort 4. Utilizing what I had learned in the past 9 months, I was able to create this project with the help of many others. Thank you to everyone who made their packages and libraries open source--you made my life so much easier. Thank you to all my teachers (especially Michael Holloway) for answering my continuous Slack messages. Thank you to my classmates, who I learned a lot from. Thank you to NSS as well for allowing me into your program and a huge thank you to Paulo Martinez for your mentorship and generous contribution. I hope to pay it forward.

zylstraa/capstone

Data and packages used:

If you're interested in my classmates' projects, feel free to check out our class website!

Page updated

Google Sites

Report abuse