STATISTICS AND RESEARCH METHODS
A course in statistics using spreadsheets and resampling, in the context of quantitative research methods and reasoning.
There are some common statistical techniques and research methods that are widely-used (in the life sciences, at least). Understanding some of the most common statistical and research methods could potentially be useful for students in many fields of science.
As a part of an undergraduate course on statistics and research methods, I have created a set of 22 activities to help guide students through the process of quantitative research. The activities provide a structured course format that can potentially increase engagement and learning of the course material (Eddy and Hogan, 2017). The activities are organized into three main categories:
SECTION 1. WHY do we need probability and statistics to help us make decisions?
SECTION 2. WHAT are scientific models? How can data lead to scientific understanding?
SECTION 3. HOW can we design research to help make robust discoveries?
A syllabus that provides more detail about how the activities were implemented in the context of a course is shown below. An example of one of the activities (on normal distributions) is also provided below.
The activities are consistent with the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendations to:
1. Teach statistical thinking.
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyze data.
6. Use assessments to improve and evaluate student learning.
Some of the main objectives of my approach to statistics and research methods were
A) To encourage students to construct knowledge about statistics and research methods based on an understanding of the reasons and principles that have led to research practices. For example:
Understanding why statistics are necessary (because of cognitive biases, logical and practical constraints).
Understanding how statistics are based on probability and counting.
Understanding the importance of statistical and scientific MODELS for research.
Understanding how general techniques such as “normalization” and “variance accounted for” can contribute to both statistics and research methods.
B) To encourage active learning using real data. A primary goal (particularly for online learning) was to make statistics and research methods a “hands-on” course. I used several methods to encourage active learning:
Encouraging students to learn how to use spreadsheets (Google Sheets, although Excel could be used for most activities) instead of powerful statistical software. In my estimation, knowing how to use spreadsheets is a fundamental college skill. However, many students do not know how to use spreadsheets, and particularly how to use functions to perform calculations. Therefore, the activities are based around problem-solving with spreadsheet functions.
(I appreciate that using spreadsheets may seem anachronistic. Many other resources are structured around more powerful platforms such as R, and neolanguages such as R and Python are widely used in many areas of data science. However, I see several advantages of using spreadsheets. First, spreadsheets are easy to use and ubiquitous – spreadsheets are designed for use by non-programmers. Spreadsheets do not involve the abstraction of using a text-based programming language, which is very challenging for many students. Second, the relative ease and simplicity of spreadsheets allows for assignments to involve mathematical problem-solving without extensive programming training. For example, students can implement their OWN resampling models to test statistical hypotheses. Granted, the models implemented in spreadsheets are much less powerful than equivalent R functions, but students can understand how processes such as resampling work by implementing the functions themselves. Third, using spreadsheets and spreadsheet functions could help students learn programming later. Spreadsheets are a type of functional programming language, and contain many aspects common to most programming languages (data types, functions, assignment, conditionals, even some looping capability). Learning spreadsheets could help students learn some fundamentals of programming before trying to master the abstractions and challenges of text-based languages and development platforms. Finally, many (most of my) students do not plan on pursuing careers that require powerful statistical programming packages. On the other hand, learning to use spreadsheets is likely to be useful for most vocations. For all of these reasons and more, I maintain that spreadsheets are an appropriate choice for active learning of statistics/research methods concepts in many fields).
Introducing resampling ("bootstrapping") before introducing parametric statistics. The worksheets (and associated spreadsheets) teach students how to set up their own “experiments” using resampling to test statistical hypotheses. Using resampling allows students to actively construct their own sampling distributions, and visualize the processes that underlie statistical tests.
Illustrating course concepts with real data that are currently relevant. The activities draw many of their examples from the COVID-19 pandemic, and other examples from publicly-available datasets such as the 500 Cities Project.
C) To integrate statistics into a broader context of research methods and scientific reasoning. Statistics is only one link in a chain of scientific reasoning. The activities place statistical methods within the larger context of reasoning and science. For example,
Understanding why science involves so much “negativity”—e.g. why null hypotheses must be rejected. Understanding why logic requires the somewhat counter-intuitive reasoning of rejecting null hypotheses.
Understanding how statistical hypotheses relate to research hypotheses (research hypotheses being both general scientific models and measurable predictions).
Analyzing the mathematics of basic statistics to discover that statistics is based on comprehensible principles of counting and algebra.
I have tried to make the activities accessible and conversational. I have tried to incorporate extensive repetition of important concepts throughout the activities (in my experience, repetition is essential). I have tried to structure the activities so that students “discover” many of the important concepts through their own problem solving.
Do they work? I have only anecdotal experience. My sense is that the class experience is challenging and intense, but the students gain an understanding of statistical and research concepts, and successfully learn how to set up and solve problems using spreadsheet functions. Of course, everything is a work in progress. In the future, I hope to port the activities to a platform such as Jupyter Notebook, which could allow for direct assessment of effectiveness.
Another disclaimer. I am NOT a statistician. There are probably errors in the activities (hopefully minor ones ;-). There is some inconsistent terminology that I need to make more consistent. Moreover, nothing has been copy-edited by anyone else but me, so there are formatting inconsistencies etc. that need to be addressed. I will provide MS Word format documents so that people can adapt them to their own needs. I am thankful to Danielle Navarro for providing excellent open-access materials on statistics which provided inspiration.
Just in case others are interested in using the approach (particularly those who are tasked with leading synchronous/asynchronous online classes and who do not want to record even more lectures for students to watch), I am willing to share the entire course (MS word format). A sample of the course activities is below. All of the activities can also be found in the "Book Version" of Research Methods/Reasoned Writing.
I am willing to share the entire course with almost anyone. However, I am not willing to share my materials with institutions that use discriminatory hiring or enrollment practices (e.g. institutions that require religious affirmations or other prejudiced policies). Your institution wants to be exclusive? You’ve excluded yourself, sorry. If your institution requires religious affirmations for employment or enrollment, please do not use any of the materials on this site.
For anyone else interested, send me a direct message (firstname.lastname@example.org) and I can provide a link to all the course materials. Below is an example syllabus:
An example of one of the course activities. Most weeks, students complete two activities, which are discussed during synchronous lecture/discussion periods.
A Table of Contents for the 22 activities (from the "Book Version") is:
SECTION 1: STATISTICAL RESEARCH METHODS_ 7
1) ESTIMATING PROBABILITIES_ 8
2) USING SPREADSHEETS_ 12
3) COGNITIVE BIASES_ 24
4) POPULATIONS, SAMPLES, AND RESAMPLING_ 31
5) PROBABILITY_ 37
6) CONDITIONAL PROBABILITY_ 54
7) REASONING_ 68
8) SCIENTIFIC MODELS AND PREDICTIONS_ 73
9) MEASUREMENTS_ 79
10) SAMPLES AND POPULATIONS_ 94
11) DESCRIPTIVE STATISTICS_ 107
12) FREQUENCY AND PROBABILITY DISTRIBUTIONS_ 123
13) HYPOTHESIS TESTING_ 136
14) CUMULATIVE DISTRIBUTION FUNCTIONS_ 150
15) THE NORMAL DISTRIBUTION_ 155
16) CONFIDENCE INTERVALS_ 170
17) Z TESTS AND T TESTS_ 179
18) “GOODNESS OF FIT” AND CHI SQUARE TESTS_ 192
19) LOGICAL FALLACIES AND HYPOTHESIS TESTING_ 200
20) CORRELATION AND REGRESSION_ 205
21) MULTIPLE COMPARISONS_ 223
22) RESEARCH DESIGN_ 239