Final Prep

What you are studying [modules 1 & 3]:

1. What is the population of interest in this study?

2. What are the two variables in this scenario?

3. Is each categorical or quantitative?

4. Is there a predicted explanatory variable and response variable? Which is which?

5. Are there two separate treatment groups or is this a matched pairs situation?

6. What kind of graph can help you determine if there is a relationship between the variables?

7. Based on this graph, do you predict that there is a link between the two variables? Why?

8. Based on the design of this scenario, is this an experiment or an observational study?

9. IF it is an experiment, is blocking or matched pairs used? Why?

10. IF it is an experiment, is blinding used in the design? Why?

ONLY if you have two quantitative variables [module 2]:

11. What is the equation of the regression line?

12. What is the slope? Explain it in a sentence.

13. Use the regression line to predict a value of y for a given value of x.

14. Find the residual of a predicted point. Sign (+/-) matters.

15. What is the correlation coefficient? Does it suggest a strong relationship between the variables?

Hypothesis testing [modules 2 & 4]:

16. Which test is appropriate for the two variables you have?

17. What are the appropriate null and alternative hypotheses for this situation? Use the correct symbols (β, μ, or p).

18. What is needed for this data to be statistically significant at α = 0.05?

19. What is the p-value for the appropriate test?

20. What does this p-value actually tell you (in a sentence)?

21. Do you reject? Is your data statistically significant?

22. As a result, can you say that the explanatory variable causes the response, that there is a link, or in this situation can you not find a link? Why?

23. Why is zero in the middle of the StatKey graph for all of the hypothesis tests?

Confidence intervals around an estimate [modules 2 & 4]:

24. StatKey takes in your raw data and spits out a distribution (that often looks uni-modal and symmetrical). What is going on behind the scenes with two variable confidence intervals in StatKey?

25. Use a confidence interval (90%, 95%, or 99%) to determine how different the two treatment groups are.

26. Describe your confidence interval in a sentence.

OLD THINGS

Part 2 Solutions

    1. 940/1140

    2. 320/610

    3. .261 to .338 (.300 +/- .338)

    4. -.032 to .228 (.098 +/- .130)

    5. Yes there is evidence, p = .024 (remember to do one sided test)

    6. -.128 to .397 (.134 +/- .263)

    7. skip

    8. skip

    9. 30.5

    10. d

    11. b

    12. c

    13. a

    14. skip

    15. c

    16. a

    17. b

    18. b

    19. a

    20. d (I wouldn't expect you to know this one)

1: C:3 vs. Q -- ANOVA

2: C:2 vs. C:2 -- 2-Proportion

3: C:2 vs. Q, matched pairs -- 1-Sample

4: C:2 vs. Q -- 2-Sample

5: Q vs. Q -- LinReg

6: C:2 vs. Q, matched pairs -- 1-Sample

7: C:3 vs. C:2 -- Chi Squared

8. Unions:

    • a:

Part 1 Solutions

    • b: Yes – there is a clear linear relationship.

    • c: y = -.406x + 825.

    • d: The slope is -.406 percent per year. This means that each year, about a half of a percent less of the population is in a union.

    • e: The intercept is 825. This means that around the birth of Christ, 825% of the population was in a union. Since that doesn’t make a lot of sense, all the y-intercept does in this problem is make sure that the percent for values in the 1900s make sense.

    • f: Yes – there is some slight curvature, but it is small and irregular, so a linear model makes sense.

    • g: r = -.980. This is very good.

    • h: r2 = .960. Thus, the year explains 96.0% of the variation of what percent of workers are in a union.

    • i: (-.406)(1990) + 825.384 = 17.4%

    • j: (-.406)(2012) + 825.384 = 8.51%. WikiAnswers said 9.4%, so 8.51% is not a bad guess. Note that linear predictions of things like percentages tend to under-estimate when they get close to 0%.

    • k: (-.406)(2035) + 825.384 = -0.83%. Since you cannot have a negative percent of the population in unions, this estimate is obviously too low. However, the actual answer will probably approach 0%.

    • 9. pA = 44/58, pB = 38/57 (break the proportions into A and B, then make the number of people who said it was "good" your "success")

    • 10. Null: pA = pB

    • Alt: p1 ≠ p2 (no direction indicated, so use not equals test)

    • α = .05

    • 11. OK -- at least 5 successes and failures in each proportion

    • 12. p = .276, fail to reject, not significant

    • 13. When using "plus four" your props become: pA = 45/60, pB = 39/59

    • -.125 to .303

    • 14. .089 ± .214 becomes 8.9% ± 21.4%

    • I'm 99% confident that 8.9% ± 21.4% more people rated mouse A "good" than mouse B "good".

15. Hip replacement

    • a) People who are receiving a hip replacement

    • b) What type of hip procedure you get

    • c) (1) new hip replacement procedure, (2) existing common procedure, (3) no procedure [the baseline control group]

    • d) How naturally the subjects walk (as rated 1 to 10 by physical therapists) 2 months after surgery -- quantitative

    • e) Exercise / therapy (the no procedure group would likely do no extra work if they knew they were a control group)

    • f) Study director would randomly assign patients to the treatments. The patient should be blinded -- this means that they would need a fake surgery for the control group. The doctor should also be blinded until they need to actually perform any procedure.

    • g) If the patient had issues with both hips, you could compare two of the treatments in the same person. This might not be such a great idea though and might confound two treatments when the therapist judges the walking.

    • h) If you blocked by prior walking ability, then each treatment group would be more likely to be about equal.

    • i) Yes -- you would just be up front about what the different possible treatments are, and as long as they understand that they might get the sham surgery with nothing done, it is very ethical.

    • j) Patients might not do their recommended exercise each day as part of the treatment process.

16. Pharm. company:

    • a) Observational study -- there are two treatments, but patients chose which one they want, so the best you can do is to observe how they each do. There are multiple variables now being confounded, making it impossible to determine causation

    • b) Control: either drug acts as a control / comparison to the other -- no clear control

    • Randomization: none -- this is the problem!

    • Repetition: only in the single setting with the unknown number of volunteers.

    • c) Unknown.

    • d) The 2 varieties of the drug

    • e) number of allergic reactions in a given time period

    • f) With 2 treatments (categorical) and a quantitative response variable, you would use a 2-sample T-Test

    • g) Since this is only an observational study, it will be harder to claim what is the cause. Assuming the population the volunteers came from is all residents of the Rochester area, then the study conclusions would show that, when patients choose the best fitting medicine, the group using one of the medications improved more than the other.

17. Coming soon

18. Card tower:

    • a) Experiment: 2 different groups (and their only difference is the strategy used), volunteers are randomly assigned to each group

    • b) "People" or volunteers: since it is a student group, it probably refers to other students, but it does not say

    • c) Null: the time required to build the tower with the two different strategies is the same

    • Alt: the time required to build the tower with the two different strategies is different (because we don't know which one is supposed to be "better")

    • H0: μ1 = μ2

    • HA: μ1 ≠ μ2

    • d) 2-SampTTest: There are two groups, a quantitative response variable, and no matched pairs.

    • e) p = 0.989

    • f) Use 2-SameTInt (@ 95% confidence): -18.92 to 19.167

    • g) The data suggests that the 2 strategies are nearly identical. There is not a shred of evidence to support that they are different in the student population at this school.

19. Leaderboard:

    • a) Experiment: there are 2 groups being compared and both groups are the same (except for the one treatment -- leaderboard vs. no leaderboard). There is random assignment between the two groups.

    • b) Same as #1 -- probably student volunteers.

    • c) Null: people are equally fast whether there is a leaderboard or not

    • Alt: people take different amounts of time when some use a leaderboard (this is because it does not suggest a direction -- if it was your experiment and you thought that the leaderboard would make you faster, then you could do a one-sided test instead of just saying they are "different").

    • H0: μ1 LB = μ2 NO LB

    • HA: μ1 LB ≠ μ2 NO LB

    • d) 2-SampTTest: There are two separate groups and no matched pairs.

    • e) p = 0.138

    • f) Use 2-SampTInt (@ 95% confidence) of leaderboard - no_leaderboard: -6.773 to 0.973 (remember that negative is "good" because it means that the leaderboard group is faster).

    • g) There is not enough evidence to conclude that there is a difference in how long the groups took, but it appears that the leaderboard makes people faster.

Problem 20: see #13 on this file

Problem 21:

  • 22. Coin result (categorical). Options; heads, tails

  • Die result (categorical): Options: 1,2,3,4,5,6

  • 23. 11/36

  • 24. 7/37

  • 25. P(3|heads)

    • 26. 2/73, the probability of flipping a heads and rolling a four

    • 27. 4/10, the probability of flipping a tails given that you rolled a two

    • 28. 48/73, the probability of flipping a tails or rolling a six (or both)

    • 29. 8/37, the probability of rolling a two or four given that you flipped a heads

30-end: coming soon

Pick your test

http://www.ltcconline.net/greenL/java/Statistics/StatsMatch/StatsMatch.htm