In order to find communities inside the created network _community.community_louvain.best_partition() algorithm has been used. Thus the GCC of the Beatles networks has been converted to undirected and inserted as an input to this algorithm. The results found show that 9 communities have been found, with a modularity value of 0.595.
Separation between communities might be useful to analyze how songs are related between each other when establishing this linking method, and thus analyze which are the common characteristics and peculiarities of each community. In order to get a first insight of how is every community, they have been named with the song with highest degree in each one, and a Wordcloud has been made. Below we can see which are the most significant words for each community according to TC-IDF index.
From this table above certain characteristics of each community can be observed. Although most of them share some of the most common words, a few differences between them can be observed. For example in the community 4, including love me do song, love is the most used word, but ill and mine are also into the most common ones. In contrast, in community 0, including Strawberry Fields Forever song, although love is the second most used word, the other most common ones are say, good, hello and know. Despite this is not enough to affirm that there exist significant differences between the communities, the general perception about love seems to change, with a more sad-negative perspective in community 4 and a more optimistic-joyful love in the community 0.
On the other hand, if we take a look in the unique words, in other words, in how many songs of the community a word appears, it is observed how the most common ones are just present in half of the songs of the community, indicating that although there is a sense of belonging in each community, the partition is still far from being well done in terms of separating topics, and usually each community is a mix of different words, and therefore moods.
Here, more characteristics from each community can be observed.. First, we see that there are significant differences with respect to the number of words per song. While Love Me Do community has an average of around 60 words, Eleanor Rigby has 93. Moreover, the vocabulary richness in this community is also higher, with a higher average length per word and a slightly higher percentage of unique words. However, more words do not necessarily mean happiness. In this case, Love Me Do is definitively the happiest community, while Eleanor Rigby one of the saddest ones. So you know, in times of troubles, love me do.
From the subplots above, it seems that there exists a positive correlation between the number of words per song and the number of unique words, which makes sense, as more words used more chances of adding unique words. However, this allows us to discard the fact that larger songs aren't just made with repetitions but that more lexicon is included as the length of the lyrics increases.
Moreover, it can be see how when the number of songs increases, the average degree per community increases too, which seems fair, due to a community is defined as a locally dense connected subgraphs in a network. Thus, more songs inside a community, more links between them, and therefore higher degree per node. A modularity value of approximately 0.6 has been obtained in Beatles Network, which states that there is not a random division of the network and some kind of clusters are presented.
Additionally, we observe how a negative correlation seems to exist between the average degree and the number of words per song. This is surprising considering our link criteria (connect two songs if they share one at least of the 5 most common words). Hence, it might indicate that short songs talk about more connected topics, whereas longer ones talk about more unconventional ones.
Furthermore it is also interesting to analyze the fact that there are significant differences between the average VADER for each community and the average degree between them. Thus, leading to a positive correlation between the average VADER and the average degree. More positive songs, which are also shorter ones, tend to have higher positivity values. In contrast, larger songs that talk about more rare topics, tend to also be less positive.
It is engaging to extract two conclusions from this plot. First, that although 9 communities are present, with just 5, an 82.5% of the total degrees found in the network are represented. Secondly, that although community 8 is the second largest network (just one song below the biggest one), it just contains a 12.6% of the total number of degrees, and is the 5th biggest one in terms of degrees. Having more songs in the community, does not specifically mean being more connected.