Data Summaries in Google Sheets

Overview

Throughout the course you will collect and analyze data.  To derive information from it, we often need summarize it numerically and graphically to aid interpretation.  The examples below walk you through producing some numerical and visual summaries of data (graphs) in Google Sheets. The spreadsheet that resulted in these graphs can also be viewed below or found in the Data Summaries spreadsheet.  

Data Summaries

Producing Summaries in Spreadsheets

The first tab in the data sheet (the Circumference-Diameter Experiment) tab uses a "toy" example to remove ecology at this point and get you started with summaries.  Remember geometry?  Let's consider relationships between the diameter and circumference of a circle.  

You can start a new sheet in Google Sheets or use another software package (Baruch computers have Excel typically installed, and commands are very similar).  

Next, you can enter some data.  Note the sheet is setup with lettered columns and numbered rows. We use the top, or header, row , to label what we are measuring and the units we are using.

Once you have the data entered, you can get numerical summaries using basic functions.  For example, you can type 

=average(B2:B9) 

and when you leave the cell the formula will update to show you the average.  You can also select the cells to include in the formula (to be demonstrated in class).  Note that in the function bar (highlighted area), you can see the formula entered in cell B11, while the cell itself shows the resulting value.  One benefit of this method is that the summaries will update if data changes.  Other functions are used throughout the course. Several are noted in the Summarizing Data & NYC Climate Change Patterns (remote) lab.  


You can combine these formulas for more advance analysis like confidence intervals.

Calcuating Confidence Intervals

Confidence intervals are covered in the Population Statistics (remote) lab as way to quantify uncertainty. They are also an important component of many graphs but are often hard to show (see instructions with bar graphs further down).  Key points to remember

To produce confidence intervals, first remember what we need:

We can use these respective functions in Sheets to find these values:

So to get the lower bound (remember, confidence intervals are a range, update the following formula to use your data:

=AVERAGE(data)−2*STDEV(data)/SQRT(COUNT(data)) 

Similarly, the upper bound can be found with 

=AVERAGE(data) + 2*STDEV(data)/SQRT(COUNT(data)) 

Note the above is giving you the actual confidence interval; below we will note using just the values (standard error or 95% confidence interval) to make an error bar.

Visual Summaries (Graphs)

To produce visual summaries, select the rows you want to include and select Insert Chart (highlighted).  You can then manipulate the type of plot you produce and how it uses the selected data. Examples are given below.  

Scatterplots

Scatterplots are a common way to visualize relationships between two continuous variables. 

This produces a chart with editing options.

In our case a scatter chart may be more appropriate (bar and column charts in general use a lot of color to mask a lack of information).  We can make this change (see Chart style tab) and update our title (see Chart & axis titles tab).  We can delete the legend (see Legend tab, set position to "none") since its not needed for this graph.

You can add a trendline to the data by selecting series and checking the boxes for add Trendline. You can get the equation for the trendline and R2  value here as well.  These values are explained in the lab on summarizing data.   

Switching the axes

Note you can also switch the axes! If you go to Setup in the Chart Editor and click on the X-axis or Y-axis, you can change the orientation. In general we want the dependent variable on the Y-axis (so we think changes in the X-axis variable is driving changes in the Y-axis variable).




Histograms in Google Sheets

Histograms are covered in the Population Statistics (remote) lab and are useful for considering the distribution of continuous data.  See an example in the Histograms and more functions tab.  First, enter your data in a new Google Sheet.  For a histogram you will only have data in one column!


Next, select your data, insert a chart, and select histogram.  Edit the resulting chart labels as needed.  Note that if your data is very small (or large), you may need to change the units manually to ensure x-axis bin labels make sense because the default format only allows for 2 units after the decimal place. For example, if you enter your been beetle weights in grams, all bin labels may be .00.  If you manually convert these to milligrams (divide by 1000) the bin labels will be more useful.

You can also display multiple histograms and other types of charts simultaneously (on one graph) so you can compare groups. See example in the 2 groups example: t-tests, bar chart, histograms  tab.

In general, when we are start adding more informaton to our charts, like groups, we need to think about the format we use. Wide data formats typically have a column for each group (with a first row that names the group). Data can also be in long format, where we instead add a column to designate the group.  Both are useful, and Google Sheets can move back and forth. The main issue is letting the software know you have different groups.  This typically means we want each group to be a Series in Google sheets. This allows us to do things like use different colors for each group, summarize data by group, and add specific confidence intervals for each group.  

 To do this for histograms, after you create a histogram chart from one column using wide format data, go to Setup and choose Add Series. Then click on the table icon (circled in red) and add another column.


Note you may need to clean up the axes for the resulting graph.  You can also use the Histogram > Bucket Size option to make the bins a more "normal" size.  

Bar and Column Charts

Bar and column charts are another way to compare groups. They are like histograms but slightly different.  While a histogram gives you a count of how many pieces of continuous data fall into various bins, the others summarize the data for each group in some way  (so each bar is equal to a group and show a statistic). For example, the first histogram we developed above shows how how many beetles weigh between 3.07 and 3.42, how many weigh between 3.42 and 3.79, etc).  See some examples in the 2 groups example: t-tests, bar chart, histograms  tab. Note again how we can present the same data in multiple correct ways in terms of both spreadsheet layout and charts.  

To make bar or column charts its usually easiest to work with data in long format (one record per row, like on the right side of the spreadsheet).  You select multiple columns of data (one showing which group the data is from, one showing the data), insert a bar chart, and CHECK "Aggregate". You'll also need to look under Customize > Series and tell Sheets which statistic you want (typically the average).  

Alternatively, you can calculate summary statistics for various groups (like the mean and standard error) and use them to produce bar charts with confidence intervals.    Find the table below in the 2 groups example: t-tests, bar chart, histograms  tab. You can then select just the Control and mean portions and put them in a bar chart (or even a scatter plot, like I show below). Next, add another series focused on the treatment.  

You can clean up titles and axes here, but the next step is to go to Customize>Series> and then for each series (Control and Treatment) add in  a Constant error bar using the value you calculated for your 95% CI.  

This works for wide format data, but another option is to use pivot tables to summarize long format data.

Pivot Tables

Pivot tables make it easier to analyze, summarize, and eventually plot long format data (which is more common).  For an example, see  look in the 2 groups example:pivot tables tab.  This is the same example data we just used, but now only showing the long-format data.  To  make the pivot table, highlight only the cells you need; you can grab entire columns, but you  may end up with weird "0" or no header columns. Then select Insert > Pivot table

You can then create (or place) the table in a new sheet or in a cell in an existing sheet. I usually put it next to my dataset.

After you create it, the Pivot table editor opens. You can use it to make tables with various Rows and Columns, which can be filled with different Values. You can initially drag your data columns into these groups, but for more advanced work you may need to use the Add button. You can also filter the table to only include certain rows of the original data.

Here I made a group for each row.  I then made values that included the average mass (by selecting average for the Summarize by). Note you can also turn row and column totals on and off.

Use the red button to select/update data range.

I could use this summary to immediately make graphs.  Like our earlier appraoch, the tables and connected graphs will update if we change the data; if we add more cells to our data, though, we'll need to change the range  (see above image and caption).  Once created, you may need to change the Switch rows/columns and  Use columns.. as headers options to get 2 series.  Sometimes not selecting the pivot table header row also helps.

I  can also add another summary focused on the same outcome in the Values area by using the Add button ; dragging won't work as the variable is already there. For example, I can add counts and standard deviations for each group, then calculate the standard error of the mean and the ~95% confidence interval.  See the sheet for examples and formulas. You can then add this information to the graph for each series.

Customizing  Graphs

Note the Series menu also allows you to change the "look" of your graph in regards to colors, shapes, and other variables depending on the graph type.  You can also change the presentation to a log axis here.