We have some properly formatted data in the data frame "samples"
We are aware that Funk and its ilk is the most represented sampled genre
We used some quick plots to identify important trends and seminal periods of work
We created and checked our data against expanded factorized genres
Use layers and more detailed factors to analyze sampled and sampling artists and songs
At this point, we have a few useful tools to further inspect our song data, notably our expanded genres. However, to begin to inspect detailed traits, we need to calculate several other extra factors. One simple set of factors that can be calculated is frequencies, which we can easily do through the use of the table() function, which creates tables of frequencies of a passed-in column. Passing them back into data frames using as.data.frame(), and then merging those factors back into our dataset using merge(), which will match our frequencies by our defining columns. We can do this for all of our factors- right now, let's get frequencies for sampled songs and artists, and sampling songs and artists.
SFreq = as.data.frame(table(factorized$Sampled.Song),responseName=c("Sampled.Song.Frequency"))
AFreq = as.data.frame(table(factorized$Sampled.Artist),responseName=c("Sampled.Artist.Frequency"))
SSFreq = as.data.frame(table(factorized$Sampling.Song),responseName=c("Sampling.Song.Frequency"))
SAFreq = as.data.frame(table(factorized$Sampling.Artist),responseName=c("Sampling.Artist.Frequency"))
merge(factorized,SFreq, by.x="Sampled.Song",by.y="Var1")
factorized <- merge(factorized,SFreq, by.x="Sampled.Song",by.y="Var1")
factorized <- merge(factorized,AFreq, by.x="Sampled.Artist",by.y="Var1")
factorized <- merge(factorized,SSFreq, by.x="Sampling.Song",by.y="Var1")
factorized <- merge(factorized,SAFreq, by.x="Sampling.Artist",by.y="Var1")
With our new frequencies, let's see if we can start to inspect some traits of sampled songs and artists. We can do this by plotting out geom_point() plots of our sampled frequencies vs. publishing year, assign our new expanded genres as colors, and then we can use layers to overlay a general 2-D density plot over this data using stat_density2d(), so that we can both quickly observe the general trends of our frequency data, but also be able to inspect our outliers.
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampled.Song.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampled.Artist.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
We can quickly see that clearly one of the big influences on the representation of our more significant genres has been powered by popular samplings of individual artists and songs. Our general shapes are also quite similar to our overall data set, with a huge representation of the early 1970's. Let's compare this to our sampling artists and songs, instead.
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampling.Song.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampling.Artist.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
The difference is clear- we can quickly see that in contrast, frequent samplers have a broader scope, and are using a more varied genre base for their samples.