We have some properly formatted data in the data frame "samples"
We are aware that Funk and its ilk is the most represented sampled genre
We used some quick plots to identify important trends and seminal periods of work
We created and checked our data against expanded factorized genres
We used layers and more detailed factors to analyze sampled and sampling artists and songs
Begin to explore our data using all our available tools
Now that we have our frequency data and more, we can begin to explore our data freely, and build detailed plots revealing interesting features. Using our data, we can create plots of our top sampling artists and songs, and inspect how their samples are representative. We begin by using out previous frequency tables, reverse order them and pick off the top 20 most frequent for our graphics.
> SAFreq <- SAFreq[order(-SAFreq$Sampling.Artist.Frequency),]
 Then, we use subset() to select our data from our factorized dataframe that is represented in the top 20.
> topArtistFrame <- subset(factorized, Sampling.Artist %in% SAFreq[1:20,1])
After this, we build our plot piece by piece- first with ggplot(), setting the data and making sure to set drop = TRUE to eliminate the unused artist/song levels, then with a geom_jitter() plot, with the jitter position set to height=.03, so as not to unduly interfere with other categories, and then overlaid with a boxplot, set to a low transparency, and finally using a coord_flip() to free up space for the artist/song titles.
> ggplot(topArtistFrame,aes(y=Sampled.Publishing.Date,x=Sampling.Artist[,drop=TRUE])) + geom_jitter(aes(color = topArtistFrame$Sampled.Genre),position=position_jitter(height=.03)) + stat_boxplot(alpha=.2) + coord_flip()
And now we can do the same for top songs, although making sure to delete the first result, which is a "?", representing no data.
> SSFreq <- SSFreq[order(-SSFreq$Sampling.Song.Frequency),]
> SSFreq <- SSFreq[-1,]
> topSongFrame <- subset(factorized, Sampling.Song %in% SSFreq[1:20,1])
> ggplot(topSongFrame,aes(y=Sampled.Publishing.Date,x=Sampling.Song[,drop=TRUE])) + geom_jitter(aes(color = topSongFrame$Sampled.Genre),position=position_jitter(height=.03)) + stat_boxplot(alpha=.2) + coord_flip()
In essence, we can start to get a picture of what exactly is happening, regarding the breakdown of different genres amongst our samples. We can conclude that the major motivating factors behind the representations of major genres is due to repeatedly sampled tracks for songs with only a few samples- most likely this is due to the sample being a single beat or backing track. In contrast, as sampling artists use more samples, we can see a greater variety of genres in play. Go ahead and check out Part 6 for some publication-quality graphs that demonstrate this.