Probabilities and statistics

When you develop a sampling scheme you need to make sure that you are taking random samples that can be used to generalize to the population you are interested in making inferences about.

First off, make sure your hypothesis includes what population you are interested in (all people using the new community cultural center, all children enrolled in your tribe's language program) and the variables that you are going to measure (degree of satisfaction with the new facility, percentage of enrolled students completing the course). This will help you to set up the sampling scheme for your project evaluation.

Next, figure out what your constraints are-- how much time and money do you have available to conduct the evaluation? How much staff do you have to help? Things like this can really limit how many samples you can take.

Do you want to minimize the number of samples because you don't want to have a negative impact on your population? This can be important when you are worried that your data collection will affect members of your community-- you might not want to take up too much time if you are working with busy families, for example.

Once you have an idea of how your budget, time limitations, staff limitations and ethical considerations will costrain your sampling (that is, what you can't do) you can figure out what you can do-- how many samples will you take, will you take them at different times or in different places, and so on.

At the heart of your sampling scheme is the need to take representative samples of your population. This means that you need to have a good idea of what population you are working with in your project. The population is what you want to describe, what you will generalize to-- for example:

1) all users of a community food pantry during 2008

2) or it might be all Native American families in Lawrence with incomes below $40,000 per year who are in town during the summer months

Notice that in this example we describe the population we are interested in with information about who they are, where they live, what their income is, and other pieces of information that might be variables that we want to control in our study. If we were only interested in people using the food pantry in summer, for example, we could restrict our interviews to June-August and not worry about the rest of the year. If we do this, however, we can't say anything about use during November or February. Also, notice that in the first case we are only looking at people who actually use the food pantry, whereas in the second case we are also including people who might not use the food pantry.

To take a representative sample we need to make sure that the way we pick the people we will interview is unbiased, that is, we will randomly select participants from our population. There are many ways to pick random samples, but it is important that you are able to describe how you picked your sample and convince others that it was random. For example, you might take the list of people who have used the food pantry over the course of the year and assign each different family a number. Then you can go to a table of random numbers, pick 10 random numbers out of the table and find which family corresponds to which number (you can also use an online random number generator). That would give you a random sample of 10 families.

When we use the word "probability" in every day life we are usually referring to one of two things-- the likelihood that we will get a certain result in a "game of chance" like cards or dice, or what we think is the likelihood that something will happen based on our past experience.

In the first case, the outcomes are highly constrained by the rules of the game. For example, you don't suddenly tilt the table while someone is rolling dice, or you don't add a card to the deck between deals. The conditions are the same (or at least as much as we can make them) when we roll the dice the first time and then roll them again. And all of the possible outcomes are known (there are only six possible outcomes when you roll one die). Because of this we can calculate the exact probability that different outcomes will happen. This is a good introduction to what we do when we set up a sampling scheme-- we will carefully define the population and all of the conditions we will use when we do the sampling, then we will use a random number generator to pick a random sample (the equivalent of rolling the dice or flipping a coin).

In the second case, when we talk about the probability of something happening in the "real world", we don't know all the possible outcomes and things aren't constrained by a set of rules like they are in a game. Instead we use "probability" to describe our prediction of the likely outcome based on our past experience. For example, we may have a lot of experience waiting for the elevator in Wescoe, and we might know that it is usually really slow. Given our past experience, we can make predictions about how long it will take to come after we call it-- and we might decide that it is faster to climb the stairs when we are late for class. This is a reasonable use of probabilities since at least part of the reason that the elevator comes when it does is based on random factors (someone just happened to call the elevator on the floor below you).

The BBC has posted a fun graphic of probabilities in real life that you can use to compare the likelihood that various things might happen to you.

In order to get a better understanding of what we mean by probabilities, let's review some simple probability problems. To begin, we need to know what the probability of a single event (it is a random, chance event)

The relative frequency of an event = frequency of that event

total number of possible events

For example, the probability that you will get a "heads" when you flip a coin = 1/2 since it is one possible outcome out of two total possible outcomes (heads or tails). Or the probability that you will draw a 2 of clubs from a deck of 52 playing cards = 1/52

1. What is the probability that one event or another will happen?

First off we have to ask whether or not the events are mutually exclusive because this will affect the way we handle the probabilities. Events are called mutually exclusive during a single trial if only one of the outcomes is possible (you can get either a heads or a tails, you can't get both on one flip of the coin).

a. If you have mutually exclusive events, then you calculate the probability of getting one or the other by adding the probabilities

What is the probability of getting a heads or a tails with a single flip of a coin?

probability of getting a heads = 1/2 probability of getting a tails = 1/2 probability of getting a heads or a tails = 1/2 +1/2 = 1

What is the probability of getting a 2 or a 3 on a roll of a single die?

probability of getting a 2 = 1/6

probability of getting a 3 = 1/6

probability of getting a 2 or a 3 = 1/6 + 1/6 = 2/6 = 1/3

b. If the events are not mutually exclusive (the events could both happen at the same time) and you want to find out the probability of one or the other happening then you have to modify the equation-- you still add the probabilities that each will occur, but now you have to make sure you're not doubling up on any of your possible events.

What is the probability of getting either an odd number or a number less than 4 on a single roll of a die?

probability of getting an odd number = probability of getting (1 or 3 or 5) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2 probability of getting a number less than 4 = probability of getting (1 or 2 or 3) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2

notice that if we just added these two together we would get some of the probabilities doubling up (I highlighted them by making the red), since 1 and 3 occur in both sets, so we have to compensate for this by subtracting out the duplicates

probability of getting a number that is both odd and less than 4 = probability of getting (1 or 3) = 1/6 + 1/6 = 2/6 = 1/3

therefore the probability of getting either an odd number or a number less than 4 is

P(odd number) + P(number less than 4) - P(odd number and number less than 4) = 3/6 + 3/6 - 2/6 = 4/6 = 2/3

that is, if we get a 1 or 3 or 5 or 2 we will have a number that is either odd or less than 4, which is 4/6 of the possible outcomes (getting a 4 or a 6 wouldn't satisfy our condition, so 2/6 of the possible outcomes aren't either odd or less than 4)

2. What is the probability that one event and another will happen?

This time we multiply the probabilities rather than adding them.

Obviously these events aren't mutually exclusive since both can happen at the same time.

When we ask an and question (rather than the or questions above) we multiply the probabilities.

What is the probability that when you flip two coins you will get two heads?

(you get a heads and a heads)

probability of getting a heads = 1/2

probability of getting two heads = 1/2 x 1/2 = 1/4

(H,H) (H,T) (T,H) (T,T) are the four possible outcomes; getting 2H is 1/4 of the possible outcomes, getting one heads and one tails is 2/4 of the possible outcomes (H,T) and (T,H) and getting 2T is 1/4 of the possible outcomes

What is the probability that when you roll 2 dice you will get two 3's

probability of getting one 3 = 1/6

probability of getting two 3's = 1/6 x 1/6 = 1/36

Sampling with and without replacement

One of the things that you will have to decide when you set up a sampling scheme is whether you will sample with replacement or without replacement. If you sample without replacement it means that you draw a card and don't replace it in the deck (or you draw a name or number and only use it once). Sampling with replacement means that you put the card back before drawing again. This determines how you calculate the probability of occurrence for each subsequent event.

What is the probability that you draw the King of Spades, then replace it in the deck and draw the Queen of Hearts, then replace it in the deck and draw the Ace of Diamonds?

1/52 x 1/52 x 1/52 = 1/140,608 = 7.1 x 10 ^-6

The probabilities don't change because each trial is independent, that is, you sampled with replacement so the total number of cards does not change.

What happens if you don't replace them in the deck after you draw them?

You draw the King of Spades, the Queen of Hearts and the Ace of Diamonds--

1/52 x 1/51 x 1/50 = 1/132,600 = 7.5 x 10 ^-6

Notice that the probabilities change when you sample without replacement (I still wouldn't bet money on it!)

A better bet might be--

What is the probability that you draw a Spade, then a Heart, then a Diamond?

(you might notice that this is sampling without replacement)

13/52 x 13/51 x 13/50 = 2,197/132,600 = 0.0166 (which is about a 2% chance of happening)

Here are some cool internet tools for exploring probabilities like the ones we just explored:

An online virtual coin flipper

Spin the spinner

Roll the dice and see the probability distribution

And here is a good discussion of independent events-- the result of flipping a coin is independent of the result of the previous coin flip, and the result of rolling a die is independent of the result of the previous die roll, regardless of how it may seem when you are doing it many times, which takes us to The Gambler's Fallacy

The card images used on this page were created from the SVG cards by David Bellot