Exploring Associations and Group Comparisons

Unit Overview

In this unit, we'll examine relationships between variables in your dataset to determine if there is a statistically meaningful association between them. Through the process of hypothesis testing, you'll gain greater clarity on what is interesting about the data you are exploring. This will help you focus your analysis and determine what you'll want to say about your data in your final research brief.

Key Terms

Hypothesis: a statement making a prediction about a characteristic of a population.

Significance test: evaluates evidence observed in a dataset about a hypothesis.

Null hypothesis: a statement assuming that there is an absence of effect.

Alternative hypothesis: a statement assuming that an observed value differs from the null.

Test statistic: a value that is compared to a critical value in hypothesis testing to decide whether to reject or fail to reject the null hypothesis.

P-value: a critical value that indicates the probability that a given test statistic would be true in the null scenario.

Significance Testing

Variability and Significance Testing

How do we know if two variables in our dataset are related in a statistically meaningful way? Significance testing allows us to compare patterns of variation between two items in our data to determine whether there is a consistent and robust association between them. Learn more in the slide presentation at left.

Check out this easy-to-follow explanation from the experts at Data Demystified. The video at right puts significance testing in simple, relatable terms.

Food for Thought

Having a solid conceptual understanding of hypothesis testing is important for evaluating associations between variables in your analysis and determining if there is, or is not, a statistically significant relationship between two variables. Strengthen your understanding by reading the article below, which focuses on the topic of hypothesis testing from a conceptual, rather than a mathematical, standpoint.

Hypothesis Testing — The What, Why, and HowUnderstanding the intuition behind Hypothesis Testing. What exactly it is, why do we do it and how do Data Scientists perform it. Let’s…

Exploring Associations, Comparing Groups

The method of analysis you'll use will depend on the characteristics of the variables you are examining. Refer to the flowchart below for guidance. Review the slide presentations and supplementary videos corresponding with the analysis you'll be using. You may need to use more than one technique.

Cross tabulations

Crosstabs are useful for comparing the variation in two dichotomous variables, which have only two categories. Review the instructions for using the "crosstabs" command to examine your variables in the slide presentation below. For more in-depth information about the Chi-Square test statistic used to measure the dependency between crosstabulated variables, see the supplemental video, "Intro to Chi Square."

Copy of CCS Cross Tabulations

Correlations

Correlations are useful for comparing the variation in two or more ordinal or interval/ratio variables, which have ordered or continuous scales. Correlations are not appropriate for use with nominal variables. Review lab instructions for using the "correlation" command to examine your variables in the slide presentation below. For additional information about the Pearson's r test statistic used to measure the magnitude of the association between correlated variables, see the supplemental video, "The Correlation Coefficient - Explained in Three Steps." Additionally, the article "Everything You Need to Know about Interpreting Correlations" from Towards Data Science has great visuals of what correlations of various magnitudes look like as scatterplots.

Copy of CCS Correlations

Eveything you need to know about interpreting correlationsNot all correlations are what they seem

T-Tests

T-tests are useful for comparing the mean scores on two variables when the dependent variable is ordinal or interval/ratio and the independent variable is dichotomous. T-tests require you to define a "grouping" variable, which is dichotomous. The objective of a t-test is to compare the average score on a variable of interest between the two defined groups. Review lab instructions for using the "t-test" command to examine your variables in the slide presentation below. For additional information about the t test statistic used to measure the magnitude of the difference in mean scores, see the supplemental video, "T-Tests: A Matched Pair Made in Heaven."

Copy of CCS T-Tests

Class Activity

Copy of Testing Hypotheses

Summary

Exploring the relationships between variables is an important analytic step that helps in determining what is interesting, and therefore worth reporting, about your data. Your readers will only have so much attention to devote to your research report, so you want to be able to bring their focus to what matters the most. Additionally, the absence of an expected relationship may be worth drawing attention to. When reporting relationships, we err on the side of caution, taking care not to claim a relationship exists when in fact there is none.

Page updated

Report abuse