2) We were able to pull data about a song's audio features, artist name, track name, etc. from a playlist using a user's ID and playlist ID. The end product was a CSV file.
3) For us to analyze the songs that we liked and disliked we needed to add "targets" to distinguish between the two. The number 1 was used to signify liked songs and the number 0 was to signify disliked songs.
4) The next step was to look at the audio features we had pulled about the songs in the playlist, using a correlation map. The purpose of this was to see whether or not we needed to exclude certain variables.
5) We also looked at the distribution of each of the audio features in the two playlists, using histograms.
6) Now that we had our analysis of the songs in the playlist, it was time to merge the two playlists together to create one. We had unnecessary columns, therefore, we only included the ones we needed, which were the audio features as well as the track name, artist name, and track ID.
7) A very important step that we had to take in order to move onto the prediction models was to change all of the necessary variable types from object to numeric.
8) Our first prediction model was a Decision Tree Model. We looked at the accuracy, which was 90%, as well as the confusion matrix, and finally the report.
9) The second prediction model we performed was a Random Forest Tree Model. We saw that the accuracy was 97.5%, and we also had the confusion matrix.
10) The final prediction model we performed was a K-Nearest Neighbors algorithm. We saw that the accuracy was 87.5%, and we also looked at the confusion matrix and the final report.
11) Since the Random Forest Model had the highest accuracy, we used this prediction model to create our customized playlist.
12) The last step was to take our neutral playlist that we wanted our code to run through to pick out recommended songs we would possibly like.