This study begins with basic exploratory work centered around audio features. With a better understanding of the natures of the variables, we proceed to more advanced analysis (clustering analysis and association rule analysis) targeted on songs. The findings in the exploratory analysis give us confidence to conduct predictive analysis on the variables of interests. After that, we shift our focus on song themes and artists. While analysis on themes does not yield satisfying results, network analysis on artists offers interesting findings.
At last, we conclude that: (1) based on the findings of clustering analysis, there are potential subcategories under a given category; (2) classification based on audio features yields robust and satisfying results (up until 88% accuracy with different supervised learning schemes); (3) artists from way back are musically similar to musicians in our generation.
However, the results are vulnerable to biases caused by several limitations of the study. First of all, the data may have a Spotify specific bias, since we are only able to access such data from Spotify. Additionally, due to technical limitation, we are only able obtain a portion of the dataset, which may lead to sample selection bias.
Anyway, the study does add insights into analysis on music. Further analysis could incorporate both audio features and text analysis on the lyrics. Meanwhile, the network analysis points a direction for studies of music heritage, which would require more musical knowledge.