Project Overview

by Shaun Ellis & Tom Engelhardt
Rutgers Graduate School of Communication & Information
Masters in Library & Information Science Program
17:610:554:85 - Information Visualization with Anselm Spoerri

Can visualizing 50-years worth of hit U.S. pop song characteristics help us discover trends worthy of further investigation?

This website presents a project that Shaun and Tom worked on between September and December of 2010. This is not peer-reviewed research work, nor is it a study. It displays the results of an information visualization project that asked students to identify a large, abstract dataset, to visualize it, and to identify interesting trends that would have been impossible to detect without the use of visual aids. A further requirement of the project was to determine whether any spotted trends warranted further investigation through formal research. This is, in fact, one of the great things about visualizing data: It helps individuals initiate and work through a data-discovery process, not by reading thousands of rows of spreadsheet data, but by using graphs and charts and even more sophisticated visual tools as cognitive aids so that overall characteristics and trends become immediately apparent.

What follows in the rest of the website is a detailed explanation of the data domain, the data compilation methodology, the visualization tools and techniques employed, and the results of observations made on the visualizations. The authors do not offer any definitive prescriptive judgments on the production of a "hit" song.

The authors have identified some valid and interesting trends which do warrant further investigation, and are in the process of conducting this follow-up work. They encourage others to download the compiled data and contribute to the discovery process.

Is there a formula for a hit song?

What if we knew, for example, that 80% of the Billboard Hot 100 number one singles from 1960-2010 are sung in a major key with an average of 135 beats per minute, that they all follow a I-III-IV chord progression in 4/4 time signature, and that they all follow a "verse-verse-chorus-verse-chorus-bridge-chorus" sequence structure? What would this mean for the music industry? For artists and record producers? Would this teach us things about human auditory preferences? Or how these preferences have been manufactured and masterfully fine-tuned over the past half century by cunning L.A. record execs?

This project analyzes some large music data APIs in order to look for these kinds of patterns in popular music. We want to study the anatomy of pop songs in the U.S. and determine whether there is, in fact, a formula for a hit. To be clear, the visualization of hit data is only the first step in order to identify patterns in the data.  Without having a randomly sampled "control group" of non-hits by the same artists, we will not be able to come to any definite correlations or conclusions.   Furthermore, neither of the authors truly believe there can be an unchanging "formula" for a hit song, as much of what appeals to us in music is a combination of familiarity and surprise (Sacks, 2006). Currently, intangible notions such as emotional qualities also have a great affect on our experiences with music. However, while there are many outstanding questions, this inquiry may shine some light on a few characteristics that could increase the chances of a good song becoming a hit at a certain point in time.  

Sacks, O. (2006). Musicophilia: tales of music and the brain. New York: Vintage Books.