Regression

Purpose:

To predict or explain the variation in one variable (DV or Y) based on one or more explanatory/predictor variables (IV or X). It can be used to infer causal relationships between variables and is also referred to as statistical modeling, prediction, and forecasting.

Regression answers three main questions:

How much variance in the DV does our model explain?
Does our model (the IVs) explain a significant amount of variability in the DV?
For each 1-unit increase in the IV, how much change in the DV can we expect?

Context Used:

Simple Linear Regression: Model the linear relationship between 1 IV/Var1 (continuous) and 1 DV/Var2 (continuous) after a correlation is found.
Multiple Linear Regression: Model the linear relationship between the 2+ (continuous) independent variables and 1 (continuous) dependent variable.
Hierarchical Multiple Regression: When we want to dictate the order in which predictors are entered into the model and control for the impact of other variables/covariates. 2+ continuous predictors (IVs) and 1 continuous outcome (DV).

For more information (explanation, interpretation, write-up) specifically on hierarchical multiple regression, see this website

Jamovi Walkthrough:

Run a correlation matrix, and continue if significant (See "Correlation" tutorial page for more detailed instructions).
Click "Regression"
Click "Linear Regression"
Move the DV into the "Dependent Variable" box
Move the IV/predictor variables into the "Covariates" box
Check "F test" under "Overall Model Test" in the "Model Fit" menu
Check "ANOVA test" under "Omnibus Test" in the "Model Coefficients" menu
Check "Confidence interval" under "Estimate" in the "Model Coefficients" menu
Check "Standardized estimate" under "Standardized Estimate" in the "Model Coefficients" menu

Output Interpretation:

First, ensure a correlation:

Person's r: Test statistic (direction) and measure of effect size (magnitude). (Note: In the social sciences, it is impossible to get a correlation as high as 1 due to human behavior. Instead, a magnitude closer to .4 is excellent!).

If r is closer to 0, there is a WEAK correlation magnitude.

If r is closer to 1, there is a STRONG correlation magnitude.

If r is positive, there is a POSITIVE correlation direction (both variables move together in the same direction).

If r is negative, there is a NEGATIVE correlation direction (the variables move in opposite directions).

p-value: The probability of detecting a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. It IS a significant correlation, continue running a regression.

If p > 0.5, accept the null hypothesis. It is NOT a significant correlation, STOP HERE.

Q1: How much variance in the DV does our model explain?

R: Measure of association/correlation of all IVs and DV.

If R is closer to 0, there is a WEAK correlation magnitude.

If R is closer to 1, there is a STRONG correlation magnitude.

If R is positive, there is a POSITIVE correlation direction (both variables move together in the same direction).

If R is negative, there is a NEGATIVE correlation direction (the variables move in opposite directions).

R²: Effect size for regression, "coefficient of determination", the % of variability in the DV explained by the IV/ALL of the predictor variables.

"The current model explains ___% of the variance in our DV"

Q2: Does our model (the IV) explain a significant amount of variability in the DV?

df: Values in a study that have the freedom to vary and are essential for assessing the importance and validity of the null hypothesis.

f-value: Indicates if the model as a whole is significant compared to 0.

If the f-value is high (around 10-20), the variability explained by the model is much greater than the unexplained variability. This suggests that the model provides a good fit to the data.

If the f-value is low (around 1), the model does not explain much more variability than would be expected by chance. This suggests that the model may not be a good fit.

p-value: The probability of detecting a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. It IS a significant amount of variability, continue to the next question.

If p > 0.5, accept the null hypothesis. It is NOT a significant amount of variability, STOP HERE.

Q3: For each 1-unit increase in the IV, how much change in the DV can we expect? (Making a line!)

B/"Estimate": In original measure units, good for context within the one variable. Tells us the SLOPE (interpreted the same as Person's r)

If B is closer to 0, the DV does not change much for each 1-unit increase in the IV.

If B is closer to 1, the DV changes much for each 1-unit increase in the IV.

If B is negative, the DV will DECREASE for every 1-point increase in the IV.

If B is positive, the DV will INCREASE for every 1-point increase in the IV.

"For every 1 point increase in the IV, we can expect a ____ point increase in the DV"

β/"Stand. Estimate": In units of standard deviation, good for comparison between variables and reporting. This variable does not matter if there is only one predictor variable (IV).

If β is lower, it is a less important predictor with less impact.

If β is higher, it is a more important predictor with more impact.

SE (Standard Error): The amount of variance, or how far away on average each point is from your line of best fit.

If the SE is small, the points on average are close to the line of best fit.

If the SE is large, the points on average are farther to the line of best fit.

t-value: The significance of the slope

If the t-value is higher, this would indicate greater confidence in the coefficient as a predictor

If the t-value is closer to 0, this would indicate low reliability in the coefficient as a predictor, closer to a flat line.

p-value: The probability of detecting a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. The IV IS a significant contributor to the model, and makes up most of R²

If p > 0.5, accept the null hypothesis. The IV is NOT a significant contributor to the model.

Now, we can create a line using all of this output using Y=a+ bx (Y=DV, A=Y Intercept Estimate, B=Slope, X=IV)

Note: MULTIPLE Regression uses a slightly different line equation of Y=a+b₁x₁+ b₂x₂

APA Format:

Appropriate data visualization: Scatterplot (with line of best fit), line graph

Sample table: https://apastyle.apa.org/style-grammar-guidelines/tables-figures/sample-tables#regression

Sample write-up:

Simple linear regression analysis was conducted to evaluate the extent to which [independent/predictor variable] could predict [dependent/outcome/criterion variable].

A significant regression [was/was not] (1) found (F([df for regression] (2),[df for residual] (3)) = [F value] (4), p = [p value] (5)). The R2 was [R2 value] (6), indicating that [independent variable] explained approximately [R2 multiplied by 100]% (7) of the variance in [dependent variable]. The regression equation was:

[dependent variable] = [constant] (8) + [slope of the regression line](9)([independent variable]).

That is, for each one [independent variable unit of measurement] increase in [independent variable], the predicted [dependent variable] [increased/decreased] (10) by approximately [slope of regression line] (11) [dependent variable unit of measurement].

Confidence intervals indicated that we can be 95% certain that the slope to predict [dependent variable] from [independent variable] is between [lower bound of confidence interval for independent variable] (12) and [upper bound of confidence interval for independent variable] (13).

Note: Replace the [blue text in square brackets] with information from your own simple linear regression analysis. See the red numbers below for more context on how to report and where to find them.

(1) Your simple linear regression is significant if the p-value presented in the Sig. column of your ANOVA table is less than or equal to the alpha level you selected for your regression. Select an alpha level before you conduct your regression analysis. An alpha level of .05 is typical. Don't interpret a non-significant regression.

(2) and (3) Report the degrees of freedom (df) from the Regression and Residual rows of the ANOVA table, respectively.

(4) Report the F value from the ANOVA table to two decimal places. Add a leading zero to your F value if it is less than 1.00 (for example, an F value of .791 would be reported as 0.79).

(5) Report the exact p-value as per the ANOVA table to two or three decimal places, and do not add a leading zero.

(6) You will find the value of R2 in the R Square column of the Model Summary table. Reporting this value to two decimal places. If your sample size is small, consider reporting Adjusted R Square.

(7) To calculate the percentage of variance in the dependent variable explained by the independent variable multiply R2 (in (6) above) by 100.

(8) Report the Constant from the Intercept row from the Estimate column to two decimal places.

(9) and (11) You will find the slope of the regression line in the Estimate column in the row for your independent variable of the Coefficients table. Report this value to two decimal places.

(10) When the independent variable increases, the predicted value of the dependent value will increase is the slope is +, and decrease if the slope is negative.

(12) and (13) Report the Lower and Upper Bounds presented in the row for the independent variable under 95.0% Confidence Interval for B in the Coefficients table. Report these values to two decimal places.

Source: https://ezspss.com/report-simple-linear-regression-from-spss-in-apa-style/

Jamovi tutorial video for (linear) regression:

Linear Regression Tutorial.mp4

Page updated

Report abuse