Finally, we take the analysis to a more aggregated musical unit, artist. To observe the relationship among artists, we build a network where each node represents an artist and two artists have an edge means they are similar.
We hand-picked six normalized and most relevant features, acousticness, danceability, energy, valence, instrumentalness, speechiness, as our independent variables. Then we aggregate the tracks dataset by artists using the mean function to obtain a summary for each artist. Since the number of data is large, which is impossible to run calculation on and to plot on one graph, we only select the top 100 artists ranked by the popularity score provided by Spotify. To determine similarity, we, first of all, select a threshold. If the Euclidean distance between each artist is below the threshold, then an edge exists in between. Next, we use the function, e^(-x)*10, to determine the weight of the an edge, where x is the Euclidean distance. This way, the weight is normalized where 10 is the maximum and 0 is the theoretical minimum. The nodes are colored by centralities of the nodes, where dark(light) color stands for low(high) centrality.
While half of the nodes are not interconnected, the other half formed three densely connected networks. One interesting observation is that the three nodes with extremely high centralities are rock bands from the 60s to the 80s. They are 4 Non Blondes, an American rock band from San Francisco formed in 1989, Buffalo Springfield, Canadian-American rock band active from 1966 to 1968, and Silver, American rock band in the 70s. It is surprising that musicians from almost half century ago have so much connection with artists of various kinds in the later generation. Such connections and signs of heritage provide insights into modern music evolution.
Description of Network:
Number of nodes: 100
Number of edges: 267
Average degree: 5.3400
Density: 0.054
Average betweeness of the network: 0.0038
Average clustering coefficient: 0.48
Rest of the metrics: