Data was collected from the-breaks.com
> ggplot(samples, aes(x = factor(1), fill = samples$Sampled.Genre)) + geom_bar(width = 1) + coord_polar(theta = "y")
Our resulting graph is as follows, and quickly shows us the mighty influence of Soul, Jazz, and Pop on hip hop:
> qplot(x=Sampled.Publishing.Date, y=Sampled.Genre, data=definite.dates, color=Sampled.Genre, geom ="jitter", alpha = I(1/2))
We can right away see the Soul, Funk, and R&B predominance, and the years of original publishing matches up to times of their popularity and large scale release. Sampled Rap and Comedy also seems to be strongly associated with a few years of large releases post 'Rapper's Delight' in 1979, and Comedy with Eddie Murphy and Richard Pryor's big mid-1970's hits.
> qplot(Sampled.Publishing.Date, data=definite.dates, color=Sampled.Genre, fill=Sampled.Genre, geom ="histogram",binwidth=2)
The late 1970s peak is even more apparent in this graph, and you can observe that this is greatly contributed to by a large amount of Jazz samples as well.
> qplot(x=Sampled.Publishing.Date, y=Sampled.Genre, data=factorized, color=Sampled.Genre, geom ="jitter", alpha = I(1/2))
With our much longer level names, we need to drag out the window a bit, but here's the result:
SFreq = as.data.frame(table(factorized$Sampled.Song),responseName=c("Sampled.Song.Frequency"))
AFreq = as.data.frame(table(factorized$Sampled.Artist),responseName=c("Sampled.Artist.Frequency"))
SSFreq = as.data.frame(table(factorized$Sampling.Song),responseName=c("Sampling.Song.Frequency"))
SAFreq = as.data.frame(table(factorized$Sampling.Artist),responseName=c("Sampling.Artist.Frequency"))
merge(factorized,SFreq, by.x="Sampled.Song",by.y="Var1")
factorized <- merge(factorized,SFreq, by.x="Sampled.Song",by.y="Var1")
factorized <- merge(factorized,AFreq, by.x="Sampled.Artist",by.y="Var1")
factorized <- merge(factorized,SSFreq, by.x="Sampling.Song",by.y="Var1")
factorized <- merge(factorized,SAFreq, by.x="Sampling.Artist",by.y="Var1")
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampled.Song.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampled.Artist.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
We can quickly see that clearly one of the big influences on the representation of our more significant genres has been powered by popular samplings of individual artists and songs. Our general shapes are also quite similar to our overall data set, with a huge representation of the early 1970's. Let's compare this to our sampling artists and songs, instead.
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampling.Song.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
> ggplot(factorized,aes(x=Sampled.Publishing.Date,y=Sampling.Artist.Frequency)) + geom_point(aes(color=Sampled.Genre)) + stat_density2d()
> SAFreq <- SAFreq[order(-SAFreq$Sampling.Artist.Frequency),]
> topArtistFrame <- subset(factorized, Sampling.Artist %in% SAFreq[1:20,1])
> ggplot(topArtistFrame,aes(y=Sampled.Publishing.Date,x=Sampling.Artist[,drop=TRUE])) + geom_jitter(aes(color = topArtistFrame$Sampled.Genre),position=position_jitter(height=.03)) + stat_boxplot(alpha=.2) + coord_flip()