All the data used in this project were mainly extracted from three different Web sites. Next, they are presented together with the code used to build the dataset.
1) The Beatles official Page. It has been of great relevance to be able to have a reliable list of all available Beatles Songs, representing the nodes of Beatles network, together with its corresponding lyrics. The lyrics were then used to define the edges of the network under a specific criteria.
2) The Beatles Wiki. It was created in 2006, and edited some years after, currently containing a total amount of 775 pages. It has represented our main source to extract information from the available wiki-pages of Beatles songs, such as its songwriter, its corresponding album and its release date to be used as the node attributes of the network. Nevertheless, few Wiki-pages of each corresponding song had some features missing, reason why the other two Web sites were used to counter that point.
3) The Beatles Spotify Page. Their Spotify profile was thought that could be used to find other minor attributes of each song related to music characteristics, such as the danceability, the key or tempo of each track.
After considerable data extraction from txt. file of the different sources, its consequent cleaning and preprocessing steps, we ended up with a total number of 301 songs to be able to build the most exciting Beatles network of all times.
If you are curious about which kind of data has been used, you can download a .zip file containing all the different files used by clicking the button below.