The data is a subset of the larger movie ratings dataset we saw in a previous lesson.
The dataset contains roughly 300 movie ratings that were collected online in 1997-98.
The data has been filtered so that it only contains movies that were rated by at least 2 females and 2 males in the 14-18 year-old range.
We're first going to make a simple summary table that shows the average rating for every movie that's in the data.
Here is what we're going for:
In most spreadsheet programs a Summary Table is called a pivot table.
Do This: With the spreadsheet open in Google Sheets choose Data -> Pivot table...
Creating a new pivot table
The menu on the right side of the pivot table lets you choose what you want the rows, columns, and values to be in your summary table. We want to set it up so that:
Each row is one movie.
Each value is the average rating of that movie.
Do This: Follow the animation below to make a table that displays the average rating for every movie listed in the data set.
Setting the Rows to "movie," Values to "rating," and Summarize By to AVERAGE
The power of the pivot table is that it allows you to compute things you could never do by just filtering and sorting. The pivot table is doing a lot of computing behind the scenes for you - which is great - but you should understand what's really happening so you can make your own choices in the future. Here's a synopsis:
Rows - Group By: movie
Rows act like the major categories or groupings for which you want to calculate values.
The Computation: When you set the rows to be "movie," the software finds all of the unique movie titles in the raw dataset and puts one on each row. This is called aggregation, which is a fancy word that means grouping or clustering.
Values - Display: rating; Summarize by: AVERAGE
Values lets you specify the computation that should happen for each row.
The Computation: We're interested in the average rating for each movie, so for Values we choose rating, Summarize by: AVERAGE.
Change summarize by from AVERAGE to COUNT. Now, instead of computing the average rating, this will count the number of ratings for each movie.
Switching summarize by from AVERAGE to COUNT. The result shows the total number of ratings for a given movie.
Let's show both the average rating and the count side-by-side in the table. To do this we add another Values field. The count is already there, so let's add the average rating again. Now, for each movie we'll see the total number of ratings the average rating.
Making a table that shows the average rating and number of ratings for every movie in the dataset.
There's not much more to it than that. Once you get the hang of pivot tables they can be a very powerful tool for manipulating data. There are more advanced things you can do with a pivot table if you like, but you know enough now that you can probably just play around with the other settings and see what happens.
Key Ideas:
Summary tables (pivot tables) provide a way to visualize data. Yes, it's a table, but by aggregating and summarizing information from a large data set, summary tables allow you to see things in the data you might otherwise not see.
Summary tables allow you to manipulate and create new data. Even for our simple movies example here, the raw data didn't contain the average rating for every movie, or count how many ratings there were. We had to compute it, and the pivot table let us do that quickly and easily.
A summary table helps you look at your data in new ways. Think: how could data be grouped? What could be calculated? Once you know how to make a summary table you can begin to look at raw data and ask questions that you know might be possible to answer.
A summary table can be a first step toward a good visualization. Often it's difficult to make a meaningful chart or graphic out of raw data. You often want to summarize it first, then chart it!
Continue >