Introduction to Statistics

Introduction

Here I have collected some ideas about how to teach Math 1150, Introduction to Statistics.

First day project

Please read about first day projects in general at this link. Here is the activity itself and instructions:

Math 1150, Introduction to Statistics | 1.1 Statistical versus mathematical questions.docx | Instructions

The main point of this project is for students to start to get to know each other, for them to get comfortable talking in class, and for them to start to see ways in which an introduction to statistics class will be different from all of their previous mathematics classes. This should come as a relief to many of them!

Introducing graphical displays of data, specifically the histogram

One way to introduce a new topic is to use the PowerPoint provided by the textbook publisher, which tends to be heavy on words and vocabulary, but tends to be hard for students to make sense of. I would much rather start with concrete examples, concrete data. The PowerPoint below has the virtue that it starts with (simulated) data, organizes the data a bit by sorting, and then changes over to graphical displays, which make it much easier to understand the distribution of the data.

Introducing graphical displays of data.pptx

My script for this powerpoint goes something like this.

Slide 1. Just an introduction slide. Delete it if you have a better one.
Slide 2. Here is a sample of 165 numbers representing household income. These are numbers collected by the US Census Bureau, which seeks to understand who is living in the US and lots of additional information about them. All numbers have been rounded to the nearest hundred. What do you notice about the household incomes listed here? What can you learn about a typical household income by looking at the numbers? What can you learn about the range of incomes? Minimum? Maximum?
- Note: Students may not come up with very much. It's hard to make sense of 165 big numbers like this. Be patient!
- Note: US Census Bureau definition of a household at this link.
- Note: US Census Bureau definition of income at this link.
- The numbers displayed in the PowerPoint are not actual household income numbers, but rather simulated from the distribution of household incomes into $5,000-dollar ranges, as found at this link. I don't have any worries about saying that the numbers in the PowerPoint are household income numbers. For the purposes of an introduction to graphing data, they work just fine. If you want to generate your own data, you can use this Matlab script and this data file.
Slide 3. On this slide, I have sorted the household income numbers in the sample from smallest to largest. That gives the numbers some organization. Now it's much easier to say things about the numbers.
- What is the minimum?
- What is the maximum?
- Think of your own family, usually you and your parents and siblings. Make a guess about your household income and see if you can find where it would be in this list. (Don't tell us.)
- What is a typical household income? Suppose I go halfway through the list. I find the number 56,800 right in the middle. Half the numbers are below 56,800, half the numbers are above 56,800; it is called the median.
- How many incomes are in the 20,000 range? (That is, between 20,000 and 29,999.)
- How many incomes are in the 90,000 to 100,000 range?
- How many are above 400,000?
Slide 4. I have marked the minimum, median, and maximum of the sample in red. The average household income is 80,450.3, but that won't mean much to them now.
Slide 5. This is a histogram of the sample of 165 household income numbers. Each household income is shown; numbers on the horizontal axis are incomes, and numbers on the vertical axis count how often they occur in the sample. You see that there are two household income numbers above 400,000, just as we saw in the sorted list. The histogram shows how the numbers in the sample are distributed: most of them are below 80,000, not very many are above 200,000, and very few are above 400,000.
Slide 6. This is a display of the distribution using all 2014 US Census data, which is a picture of what is happening across the entire population, across all households. The distribution is smoother. Above about 30,000, each bar er lower than the last, meaning that there are fewer households with the next higher income range. The vertical lines mark percentiles. 95% of households make less than $206,600, but 5% make more than that.

Introducing the dot plot

Here is a fun way to introduce students to the idea of a dot plot, and also to show them the perils of "generating" random numbers in their heads.

Ask the students to think of a random number from 1 to 10. Choose one so that each number from 1 to 10 has the same probability of being chosen. Write down your number. That's very important. Write it down.

Now we are going to make a dot plot of the numbers I wrote down. Just watch me do it, then you'll be able to make one of your own the next time. I'm drawing a horizontal number line and marking the numbers 1, 2, 3, 4, ..., 10 under the line. Now we'll go around the room, and I want you to say, loudly, the number you wrote down. I'll draw a dot for each number, and the dots will stack up. Ready? What is the first number.

Usually, as you go around the room, you find that many students picked 3 or 7. Those are people's favorite "random" numbers. If people hadn't written down their number, they might want to change it once they see how many 3's and 7's there are. Hopefully this works out well for you. If you get lots of 3's and 7's, you can say that's no surprise, those are the numbers that people think of as being "random". In fact, when people try to cover up illegal transactions in accounting ledgers, they tend to use numbers that are skewed toward 3's and 7's, and there are other ways in which they generate the wrong numbers. One tool in the field of forensic accounting is to see if the numbers in an account book have an unusual distribution.

Now let's do the whole thing again. Once again, write down a random number from 1 to 10, in a way that every number has the same probability of being chosen. I'll make another dot plot, and this time everyone else can make a dot plot on their own paper. Don't worry if you miss some points. Draw a horizontal line, label the numbers 1, 2, 3, 4, ..., 10. Ready?

With any luck, no one will choose 3 or 7. That is also very funny. It shows how hard it is for the human brain to properly generate random numbers. So when we need to do that, we won't just do it in our heads.

Coin flipping to illustrate the law of large numbers and confidence intervals

I created some HTML pages with graphics generated by javascript, to illustrate how the sample proportion changes as you flip a coin repeatedly. The coin can be fair, or you can set the probability of heads, or you can generate a "mystery" coin with an unknown probability of landing heads, which is the most realistic situation when we collect data. A graph shows how the sample proportion settles down to the probability of heads as the number of flips increases.

A second page shows confidence intervals graphically each time you collect a sample of a given size. You can vary the probability of heads and vary the confidence level. This really helps students understand confidence intervals.

You can download all of the HTML pages about coin flipping.

Google Sites

Report abuse