1. Two Variable Introduction

Learning objectives (and summaries)

Distinguish different types of two variable scenarios and make a prediction on the variables' independence and causality.

  • Understand the difference in purpose of analyzing two variables instead of one variable

    • When analyzing one variable, you often want to predict a parameter in a population using a sample. You look for the proportion or average.

    • When analyzing two variables, you often want to determine how they are related so you can use one value to predict another. You look for correlation, a difference in means, or a difference in proportions.

  • Understand the meaning of independence/dependence of variables

    • Independent variables have no effect on each other. For example, if knowing someone's height does not help you predict their GPA, or vise versa, then those two variables would be independent.

  • Using a scatter plot, identify when two quantitative variables are dependent

    • If there is a pattern / trend, the variables are dependent. Use StatKey's "Descriptive Stats for 2 Quantitative Variables".

  • Using a bar graph or a two way table, identify when two categorical variables are dependent

    • If the fraction of each bar that is a given color is the same for both bars, then the variables are independent. Otherwise, they are dependent. Use StatKey's "Descriptive Stats for 2 Categorical Variables".

  • Using stacked box plots or the means of different groups, identify when a categorical variable and a quantitative variable are dependent

    • If one box plot is shifted from the other, or the means are significantly different (relative to the scale of the other numbers), then they are dependent. Otherwise, they are independent. Use StatKey's "Descriptive Stats for 1 Quantitative and 1 Categorical Variable".

  • Understand the difference between correlation and causation and why variables that are correlated may have lurking variables that cause both

    • Correlation = the variables are dependent / linked. Causation = you know that one of the variables is the reason that the other variable changed. Without proving causation (usually through an experiment), there is always a chance that both variables are being influenced by something else (a "lurking" variable).

Assessment

    • Test (14pts): 11 questions (6 MC, 4, MC+explain, 1 fill-in), AND 1 of these free response questions (3pts):

    • What does it mean for two variables to be dependent?

      • Once you know two variables are correlated, why do you need an experiment to prove that one variable caused the other variable?

      • The goal of analyzing the relationship between two variables is different than the goal when working with only one variable at a time. Explain.

Instruction

Guided Notes

Practice

For problems 1-5, answer the following:

a) List the 2 variables and whether they are categorical or quantitative.

b) Which section would you use in StatKey to create a chart / graph?

c) Which variable is likely the cause and which is likely the response? If neither, what might a lurking variable be that connects these two? Which input leads to which output?

1. Premium gasoline (89 octane) gives cars better gas mileage than regular gasoline (87 octane).

2. The weekly grocery bill is associated with the number of family members.

3. Taking a recently developed pill each day will reduce the number of headaches experienced over the next 3 months compared to another brand.

4. Professional sports team’s winning percentage is associated with the team’s average salary.

5. A classroom poll asked students if they liked math or not based on what class they were enrolled in

For problems 6-12, answer the following:

a) List the 2 variables and whether they are categorical or quantitative

b) Using the appropriate chart/graph in StatKey, enter the data and observe the patterns.

c) Does there appear to be a relationship between the variables (are they dependent on each other) based on the graph/chart? In the case of one categorical and one quantitative variable, does the average change based on the category?

d) Take your best guess at the explanatory variable (the cause), and if not present, a possible lurking variable.

6. A server who suggests the most popular appetizer to customers at Restaurant A will make more appetizer sales than a server who asks “can I start you out with an appetizer?” They tested this and here is the data:

Appetizer sales for servers who suggested the most popular appetizer over a weekend shift: 3, 3, 0, 5, 2, 1, 2, 0, 2

Appetizer sales for servers who asked “can I start you out with an appetizer?”: 4, 1, 0, 0, 2, 3, 0, 0, 2

7. Does gender affect grades? A teacher had 6 boys with A's, 3 boys with B's, 10 girls with A's, and 5 girls with B's.

8. Does the age at which smoking ceased smoking impacts a person's cumulative risk of lung cancer? A study was performed to explore the link. Data is in (age, risk of lung cancer), one person per ordered pair:

(0,0.2) (30, 1.1) (40, 2.6) (50, 5.6) (60, 11.1) (75 15.7)

9. Does your favorite subject relate to whether or not you eat meat?

10. A student group decided to compare how well players did using two different strategies of building a card tower. The subjects were first instructed on a specific method they needed to use for their tower and told them it was required to use this strategy. Since the group didn’t want players to mix strategies, they tested two completely separate groups of people. People volunteered to play and were randomly assigned to a strategy on the day of the experiment. The results:

Strategy 1 (seconds required to build the tower): 33, 42, 59, 68, 73, 91, 33, 45

Strategy 2 (seconds required to build the tower): 73, 33, 49, 62, 65, 48, 47, 66

11. Does the annual average price of milk relate to the US GDP? In 2003, milk cost $2.76 and GDP was 2.3. In 2006, milk was $2.56 and GDP was 2.2. In 2009, milk was $3.78 and GDP was 5.6. In 2012, milk was $3.70 and GDP was 5.7.

12. Some people think that the owner's gender is a good predictor of whether or not a car has new speakers or the standard (stock) speakers. Some sample data is below:

Practice solutions

    1. a) gas type (categorical) vs. gas mileage (quantitative)

    2. b) use "one quantitative and one categorical"

    3. c) type of gas should cause gas mileage to change, and premium gas should result in higher mileage

    4. a) grocery bill (quantitative) vs. family size (quantitative)

    5. b) use "two quantitative variables"

    6. c) family size causes grocery bill, and larger family = larger bill

    7. a) pill type (categorical) vs. number of headaches (quantitative)

    8. b) use "one quantitative and one categorical"

    9. c) pill type should cause the number of headaches, and the new pill should result in fewer headaches

    10. a) winning percentage of team (quantitative) vs. average salary of team (quantitative)

    11. b) use "two quantitative variables"

    12. c) either method of causation- higher salary can cause higher win percentage→by buying better players with higher salaries OR a better win percentage can heighten salaries→if they are rewarded for playing well

    13. a) attitude towards math (categorical) vs. class enrolled in (categorical)

    14. b) use "two categorical variables"

    15. c) this one is tricky -- I would actually bet that math ability is the lurking variable that determines which class they are in and their attitude towards math

    16. a) what the server says (categorical) vs. number of appetizer sales per server (quantitative)

    17. b) use "one quantitative and one categorical"

    18. c) yes -- the average changes quite a bit from the group with a suggestion to the group without

    19. d) the suggestion is probably causing the sales to increase (but a possible lurking variable is the quality of the server -- it may both cause more sales and cause the person to suggest an appetizer)

  1. a) gender (categorical) vs. grade (categorical)

  2. b) use "two categorical variables", and create your table like so:

  3. c) no -- there is a .667 chance of getting an A in general, a .667 chance of getting an A if you are a boy, and a .667 chance of getting an A if you are a girl, so gender has no effect.

  4. d) not relevant -- no linkage exists

  5. a) when a person stops smoking (quantitative) vs. cumulative risk of lunch cancer (quantitative)

  6. b) use "two quantitative variables" -- note the pattern in the data

  7. c) yes! There is a clear pattern formed in the scatter plot (upwards)

  8. d) age of quitting explains risk of cancer, and the sooner you quit, the lower your risk of lung cancer

    1. a) favorite subject (categorical) vs. if you eat meat (categorical)

    2. b) use "two categorical variables"

    3. c) sort of -- it is not exactly the same chances when broken down one subject at a time, but at least visually it doesn't seem convincing that the differences are more than randomness.

    4. d) there really isn't any strong reason for a linkage, but maybe people who are more logical than feeling will both like math and not be opposed to eating meat

    5. a) which strategy was used (categorical) vs. time to build the tower (quantitative)

    6. b) use ."one quantitative and one categorical"

    7. c) the means are almost identical, so the center doesn't really move. The two have a huge difference in spread -- strategy 2's standard deviation is much lower and its box plot is much more squished.

    8. d) the strategy causes the differences in time -- we can conclusively say this because it was a carefully designed experiment (which you will learn about in a future module).

    9. a) Two quantitative variable

    10. b) Use the test of descriptive statistics for two quantitative variables

    11. c) Yes, there appears to be a relationship between the price of milk and the GDP

    12. d) Looking at the graph, it seems that the GDP causes the price of milk to fluctuate. However, there could be a lurking variable such as inflation or scarcity of milk, so we cannot be 100% sure that the GDP explains the price of milk.

    13. a) Two categorical variables

    14. b) Use the test of descriptive stats for two categorical variables

    15. c) Yes, there appears to be a weak relationship between gender and type of speaker

    16. d) It seems that what gender you are could determine what kind of speakers you have in your car. It isn't strong enough to say that for sure, so we would have to do more research.

Notes

http://youtube.com/watch?v=2JB09WbZiGg

http://youtube.com/watch?v=W2fIURsjQu8