Correlational & regression in Research

Correlation

Correlation measures the strength and direction of the linear relationship between two variables. It is a number between -1 and 1. A correlation of 1 indicates a perfect positive relationship, -1 a perfect negative relationship, and 0 means no linear relationship. Correlation doesn't imply causation.
Introduce key terms:
- Positive correlation: As one variable increases, the other increases.
- Negative correlation: As one variable increases, the other decreases.
- No correlation: No consistent relationship between the variables.
The correlation coefficient (r): Values range from -1 to +1, indicating strength and direction.

Regression

Regression, on the other hand, involves modeling the relationship between a dependent variable and one or more independent variables. It estimates how the dependent variable changes as the independent variable(s) change. The most common form is linear regression, which fits a line to the data, similar to understanding the direction of correlation but with a focus on predicting one variable based on the other.

Correlation considerations:

Pearson/Spearman correlation is used for continuous (Pearson) or ordinal (Spearman) data with 2 variables (e.g., height and weight). It measures the strength and direction of the relationship but doesn't imply causation.
Chi-square is used for categorical data (nominal or ordinal) with 2 categorical variables (e.g., gender and voting preference), testing for association without indicating direction.

Regression:

Regression models the relationship between a dependent variable and one or more independent variables, predicting the outcome and estimating the strength and direction of the relationship. Unlike correlation, it shows causality potential and is used for prediction.

CORRELATIONAL TESTS

Chi Square

Catagorical Data

Tests for association / independence

Strength No

Direction No

Predicts Outcome No

Independ. Variables 2

Pearson’s

Continuous

Measures strength and direction

Strength Yes

Direction Yes

Predicts Outcome No

Independ. Variables 1

Depend. variables 1

sPearson’s

Ordinal or Continuous

Measures monotonic relationship

Strength Yes

Direction Yes

Predicts Outcome No

Independ. Variables 1

Depend. variables 1

regression

Continuous (and sometimes categorical*)

Models relationships; predicts an outcome

Strength Yes

Direction Yes

Predicts Outcome Yes

Independ. Variables 1 or more

Depend. variables 1

Lets sell some lemonade!!

Chi-Square Example (Lemonade Sales and Weather):

Chi-Square Statistic (χ²): Tells you if the weather (sunny or rainy) affects lemonade sales (high or low).
p-value: Tells you if the relationship between weather and sales is statistically significant (e.g., a p-value of 0.03 means the association is likely not due to chance).
Contingency Table: Shows the number of high and low sales for sunny vs. rainy days.
Effect Size (Cramér’s V or Phi φ): Tells you how strong the relationship is between weather and sales (small, medium, or large effect).

Pearson/Spearman Correlation Example (Lemonade Price and Sales):

Pearson/Spearman Correlation (r): Measures how price and sales are related.
- +1 means as price goes up, sales always go up.
- -1 means as price goes up, sales always go down.
- 0 means no relationship.

Regression Example (Lemonade Price and Sales):

Regression Coefficients (β): Shows how much sales change when you change the price of lemonade.
Intercept (β₀): The expected sales when the price is zero.
R²: Tells you how much of the variation in sales can be explained by price changes.
p-values: Tests if price significantly affects sales.

Regression Overview:

Regression helps you understand how multiple factors (predictors) affect one outcome (dependent variable). In the case of selling lemonade, the dependent variable could be sales, and the independent variables (predictors) could include price, temperature, and advertising effort.

Linear Regression:

Description: Linear regression models the relationship between two variables with a straight line.
Example: If price increases by £1, sales decrease by 50 lemons. This suggests a linear relationship: for each £1 increase in price, you lose 50 lemons in sales.
Key Idea: It shows a constant rate of change between the independent and dependent variables (price and sales).

Multiple Regression:

Description: Multiple regression models the relationship between two or more independent variables and a dependent variable.
Example: Predicting lemonade sales based on price and marketing spend. The model could show how each factor (price and marketing spend) influences sales.
Key Idea: It helps you understand how multiple factors work together to affect the outcome (sales).

Logistic Regression:

Description: Logistic regression is used when the dependent variable is binary (e.g., yes/no or high/low).
Example: Predicting if sales are above or below a threshold (e.g., whether you will sell more than 100 lemons) based on factors like price and temperature.
Key Idea: It predicts the probability of one outcome or the other (like a yes/no decision).

Logarithmic Regression:

Description: Logarithmic regression models a relationship where the rate of change decreases over time (i.e., diminishing returns).
Example: If the price of lemonade increases, initially sales drop sharply, but as price continues to rise, the effect on sales becomes smaller (diminishing returns).
Key Idea: It’s useful when the effect of one variable (like price) has a large impact at first, but that impact tapers off as the independent variable increases.

In summary, regression models help predict sales based on various factors and show the strength and direction of these relationships, whether linear or non-linear, and even when the outcome is binary.

Finding the right statistical test

Page updated

Google Sites

Report abuse