Once data are collected, we often need to summarize them numerically and graphically to aid interpretation. This lab begins with using an online climate data exploration tool to visualize climate change trends since 1895 in New York City's Gramercy neighborhood where Baruch College is located. Then we analyze these data using a spreadsheet to estimate the central tendency (mean, median, and mode) and make graphs with appropriate elements to estimate quantitative trends. This lab activity summarizing long-term climate data prepares us for future lab activities where we evaluate links between climate change and the location, abundance, and behavior of organisms. Quantifying the variation in datasets (e.g., distributions, variance, confidence intervals) is covered later in the Population Statistics lab activity.
Statistical background reading can be found in an introductory stats book (see one here), but we'll only focus on basics for this lab and overall course.
For instructions on how to produce visuals in Google Sheets, check out this page.
Students will be able to use an online data exploration tool to obtain climatic data and visualize long-term trends
Students should be able to take a dataset and:
Produce estimates of the minimum, maximum, and range
Produce estimates of the mean, median, and mode
Produce graphs with appropriate elements (title, axes labels and unit markers, data points)
Interpret graphs to describe long-term trends in climatic parameters
Data are recorded information, usually in the form of numbers, that can be examined to discover patterns and trends. Ecologists collect a wide variety of data in order to understand the abundance and distribution of organisms on the planet as well as their relationships with each other and their environments. While data are typically collected from individual organisms (e.g., individual height), we often can't measure every single organism in an entire population. As a result, we collect and summarize data from a sample of individuals and use it to estimate traits about the population (e.g., average height in a population). The science of statistics then allows us to determine how good (accurate) our population estimates are (confidence).
Today we'll use a long-term climate data set to practice data analysis and summary skills. This process of analyzing and summarizing data lies at the heart of understanding ecological patterns and relationships. Data collected by scientists can be summarized in two main ways - numerical and visual summaries. Numerical summaries include information on the distribution of data such as the mean (average) and amount of variation. Visual summaries include graphs and charts that use lines and symbols to represent trends in the data.
Numerical Summaries
Numerical summaries are used to help us understand general trends in the data. A common example is the mean, which is the average value for some trait in a population. If we have n pieces of data, we can add them together and divide by n to get the mean. Another example is the median. The median is the piece of data that falls in the center of the dataset if they are arranged in order. Note that for data sets with an even number of elements (pieces), the median is the average of the two pieces closest to the center.
If there are any extremely large or small numbers in the data set (outliers), they will raise or lower the mean value of the population. In this case, the mean value doesn't give us the best representation of the population because it's swayed by outliers. Alternatively, the median value is not influenced by outliers in the data set. You may have heard news reports on the "median home price" or "median household income" - the median is used here to make sure the numerical summary isn't influenced by extremely high values.
Visual Summaries
Visual summaries take numbers and turn them into images. Just like numerical summaries, they distill data so that general trends and points are easier to understand. One common form of visual summary is a graph. Different types of graphs exist (e.g., bar, line, histogram), but all share some general characteristics. Graphs should contain:
Axes labels with units - Each axis (side of the graph) should be labeled. What is it (length, mass, time, etc.) and how was it measured (metric unit)?
Visual representation of the data - Points, bars, or other shapes that represent the data.
A descriptive title - Not just "variable 1 vs variable 2"! This is the default title in Google Sheets and Excel, and it's awful. It tells us nothing of the relationship between these variables, only that they're graphed... but we already know this just by looking at the axis titles! Instead, our titles should always state the relationship between the variables to aid in interpretation. A good template to practice would be the form of "Y-Trend-With-X", where you would state the y-axis variable, describe the type of trend, then state the x-axis variable. Let's try: if we were to make a graph showing faster plant growth at higher temperatures, we might title it "Plant growth increases with temperature". To make the title even better, you might quantify how much plant growth increases: "Plant growth increases 25% with each additional degree Celcius".
Other options that may be included are legends (useful if you are displaying data from different groups), lines of best fit (trendlines), confidence intervals, and other factors.
Linking Numerical & Visual Summaries: Trendlines & R2 values
When making visual summaries of data, we should also include basic mathematical descriptions of the visual trends. This is easily accomplished by fitting a trendline to the data and evaluating both its equation and R2 value (pronounced R-square). A trendline is a line drawn through a cloud of data points that best "fits" the data (i.e., minimizes the mathematical distance between each data point and the line). In other words, the trendline visually depicts the average relationship between x and y in a graph. The mathematical relationship between x and y is then described by the trendline equation and the R2 value:
Trendline equation - Recall the generalized formula for a linear line, y=mx+b, where m is the slope, x is each unit on the x-axis, and b is the y-intercept. A positive slope is indicated by a positive value of m, and a negative slope is indicated by a negative value of m. We can use this equation to calculate a y-value for any value of x. For example, using the equation y=2(x)+5, if x=2 then y=9. You'll use this technique to evaluate and predict climate patterns below.
R2 value - This value tells you how well the trendline fits the data (i.e., the strength of the relationship between the x and y variables). R2 values range from 0 to 1. A value of 1 means a perfect fit (every point falls right on the trendline), while a value of 0 indicates there is no relationship among the variables. The higher the R2 value, the stronger the fit between x and y. As an example, if R2 = 0.65 then we would say "x explains 65% of the variation in y".
In this activity you'll produce scatterplots (scattercharts), fit trendlines to the data, and then use trendline equations and R2 values to quantify and interpret the trends. Scatterplots are very useful, and widely used for demonstrating the relationship between two variables. For instructions on how to produce visuals in Google Sheets, check out this page.
Ecologists often evaluate how living systems respond to climatic variability. This requires access to climate data as well as the ability to analyze them. To get started with accessing and visualizing climate data, we’ll head to the PRISM Climate Group's Data Explorer. This is a website run by climate scientists at Oregon State University and serves as one of the primary data portals for downloading climatic data. On this site you’ll be able to quickly visualize climate trends for NYC going back to 1895.
Go to the PRISM Climate Group's Data Explorer. Orient yourself to the four main areas of this tool: Location; Data Settings; Controls; and the map. On the map, notice how there is a grid of rectangles. Climate data are available for each individual grid. For example, you could look at lower Manhattan climate trends separately from those in midtown or upper Manhattan.
At the bottom of the Data Settings options, click on the option for “Interpolate grid cell values”. This tells the database to average several nearby grids in a particular way so that the climate data can be estimated for one particular point within one grid.
Go back up to the Location options, click on Coordinates, and enter these numbers [Latitude: 40.7394] [Longitude: -73.9846 make sure you include the negative sign]
Click on “Zoom to location”, and you should see the map adjust with a red circle appearing in Manhattan, at the corner of Lexington Avenue and 23rd St in Manhattan (the location of Baruch’s 17-Lex building).
In the Data Settings box, highlight only Precipitation and Mean Temp (deselect any others)
Select Monthly Values, and set them like this: [Start: January 1895] [End: December 2018] with [Units: SI (metric)]
Under Controls, click “Retrieve Time Series”. A graph will appear below, showing values for precipitation and temperature in Baruch’s neighborhood, spanning the time frame from January 1895 to December 2018. Answer the following questions, and note that you don't need to download the time series because we're providing you with a data set for this location that's already been prepared for you.
1. What trends can you observe in the climate data? Use the graph y-axes to visually estimate average precipitation and average temperature. Record them here, being sure to report proper units in your answer:
Average precipitation 1895-2018:
Average temperature 1895-2018:
What do the average temperatures tell you?
2. Look specifically at the variation in precipitation and temperature graphs. Are there any visible patterns in the precipitation data? How about in the temperature data? Describe them here. Consider differences with-in and among years as well as early-century vs late-century differences.
3. Visually compare between precipitation and temperature trends. For these data, do you think these climate factors are related to each other (or correlated)? In other words, do high precipitation years tend to occur during high or low temperature years? How confident would you rate yourself in your ability to visually detect a correlation?
4. Now look at the third graph, at the bottom. This is a slider that allows you to see more detail for any particular time frame – it lets you zoom in to see the month-by-month values for any time period. Using the sliders, zoom in to the year with the highest precipitation value. What’s going on here?... Was this an entire year that was unusually wet? Or was there some type of short-term weather event that caused this?
5. Once you’ve answered the previous question, use the internet to search for the potential cause. Search terms might include “NYC precipitation + time period” where you designate the year and/or month(s) for the time period search term. What happened during that time? How are short-term weather events different than long-term climate trends?
6. What other major climatic events can you see in NYC climate history? For example, what were climate patterns like during significant historical events in NYC?… For example: Stock market crash of 1929? Stonewall uprising of 1969? Hurricane Sandy 2012? What else can you find?
Acquiring raw climate data from online portals generally requires a lot of effort preparing the data for analysis. For this activity we’ve already downloaded climate data from PRISM and prepared it for you (though we've not arranged (sorted) the data in any particular order so that you can practice that step). Download our climate data set and save a copy to your own Google Drive.
Metadata
Metadata are the description of the data set (data about data), something we’ll discuss in subsequent lab activities. The data set we’re using contains the following information:
Year (1895-2018)
Month: either 2 (February) or 8 (August). Notice that there is a separate worksheet for either month.
Precipitation (mm): Total amount of accumulated precipitation per month. Example: If there were 5 days of precipitation in one month, and each of those days had 10mm, the accumulated precipitation for the month would be 50mm.
Tmean (°C): Mean daily temperature, averaged over the month. Example: If today’s coldest overnight temperature (Tmin) were 10°C, and the warmest temperature (Tmax) were 20°C, the mean daily temperature would be 15°C. When the daily Tmean value is averaged across the entire month, that value is what appears in our data set.
Tmin: Minimum daily temperature, averaged over the month.
Tmax: Maximum daily temperature, averaged over the month.
Essential formulas for spreadsheet calculations
In this data analysis activity, as well as in many subsequent lab activities, we'll use spreadsheets to calculate numerical summaries. Following are some of the formulas we'll use throughout the semester:
Name Formula Function
Mean =average() Calculates the average value of a data range
Median =median() Calculates the median value of a data range
Min =MIN() Identifies the minimum value of a data range
Max =MAX() Identifies the maximum value of a data range
Sum =SUM() Calculates the sum of a data range
Create numerical & visual summaries
7. First let's get the February data set in order by year in ascending order (going from past to present). There are several ways to do this.
You can highlight the column(s) you want to sort, including the header (the names at the top of each column). Then look under the Data menu and select "Sort Range". You can then pick which column you want to search by (check the box for "data has header row" to exclude the header row from being sorted and allow you to select columns by their name.
If you only want to sort one column, or you want to sort a group by the first column in it, you can highlight just the data (not the headers at the top of each column) you wish to sort. Then look under the Data menu and select ("A->Z" in the sort options).
Repeat this step for the August data. With that done, use either the Sort function or the MAX and MIN functions to identify the following climatic extremes:
What’s the coldest winter on record, according to these data? (i.e., what year had the coldest February Tmin?)
What’s the warmest summer on record, according to these data? (i.e., what year had the warmest August Tmax?)
In which month (February or August), and in which year, was precipitation the all-time lowest?
In which month (February or August), and in which year, was precipitation the highest?
8. Calculate the following numerical summaries for the February data set. Include units in your answers.
Average precipitation 1895-2018:
Median precipitation 1895-2018:
Average Tmean 1895-2018:
Median Tmean 1895-2018:
9. Repeat the previous step with August data:
Average precipitation 1895-2018:
Median precipitation 1895-2018:
Average Tmean 1895-2018:
Median Tmean 1895-2018:
10. In your answers for 8 and 9, were the median values similar to the average values? Which of these measures is most likely to be influenced by extreme values, the mean or median?
11. For February data, create a scatterplot of Tmean vs Year (i.e., Tmean on the y-axis and Year on the x-axis). Add a trendline including the equation and R2 (R-square) value. Make sure your x-axis and y-axis have titles and units. Based on the trendline and its equation, how much has average winter temperature changed since 1895 in NYC? Include a copy of your graph for your answer.
12. In the previous graph, how strong is the relationship between Tmean and Year? How can you determine that?
13. Repeat the previous graphing step with August data, being sure to include all necessary components (titles, units, trendline, R2). Based on the trendline and its equation, how has average summer temperature changed since 1895 in NYC? Include a copy of your graph for your answer.
14. Compare your graphs from 11 and 13. Which month has a stronger correlation between Tmean and Year? How can you determine that?
15. For February data, create a scatterplot of Precipitation vs Year. Add a trendline including its equation and R2 value. Based on the trendline and its equation, how much has average winter precipitation changed each year since 1895 in NYC? Include a copy of your graph for your answer.
16. Repeat the previous step with August data. Based on the trendline and its equation, how much has average summer precipitation changed each year since 1895 in NYC? Include a copy of your graph for your answer.
17. Go back to your graphs of “Tmean vs Year” and compare between February and August. Are winter and summer temperatures changing at a similar rate, or is one season changing faster than the other? How can you quantify how much of a difference there is in this rate?
18. For our last graph, let’s use the long-term climate data set in a different way. Instead of evaluating long-term trends in these factors, let’s evaluate their relationship with each other in order to predict future climate! We know that summer temperatures are increasing in NYC over the past century, and in a previous question you already quantified how much. What kind of change in precipitation might we see in the future as NYC summer temperatures continue to rise? Create your own graph to answer this question, including all of the necessary components, and provide a brief interpretation or summary. Include a copy of your graph for your answer.
Take a moment to reflect on the overarching outcomes of your analysis. What do these results mean to you personally? What about professionally? Is there anything that's surprising? This is an open-ended question with no correct or incorrect answer.