Regression

Purpose:

To predict or explain the variation in one variable (DV or Y) based on one of more explanatory/predictor variables (IV or X). Can be used to infer CAUSAL relationships between relationships. Also referred to as statistical modeling, prediction, and forecasting. 

Answers three main questions: 


Context used:

For more information (explanation, interpretation, write-up) specifically on hierarchical multiple regression, see this website 


Jamovi Walkthrough:


Output Interpretation:

First, ensure a correlation:

Person's r: Test statistic (direction) and measure of effect size (magnitude). (Note: In the social sciences, it is impossible to get a correlation as high as 1 due to human behavior. Instead, a magnitude closer to .4 is excellent!). 

If r is closer to 0, there is a WEAK correlation magnitude.

If r is closer to 1, there is a STRONG correlation magnitude. 

If r is positive, there is a POSITIVE correlation direction (both variables move together in the same direction).

If r is negative, there is a NEGATIVE correlation direction (the variables move in opposite directions).

p-value: The probability you detect a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. It IS a significant correlation, continue running a regression.

If p > 0.5, accept the null hypothesis. It is NOT a significant correlation, STOP HERE.


Q1: How much variance in the DV does our model explain? 

R: Measure of association/correlation of all IVs and DV.

R²: Effect size for regression, "coefficient of determination", the % of variability in the DV explained by the IV/ALL of the predictor variables. 

"The current model explains ___% of the variance in our DV"


Q2: Does our model (the IV) explain a significant amount of variability in the DV? 

df: Values in a study that have the freedom to vary and are essential for assessing the importance and validity of the null hypothesis. 

f-value: If the model as a whole is significant compared to 0.

p-value: The probability you detect a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. It IS a significant amount of variability, continue to the next question.

If p > 0.5, accept the null hypothesis. It is NOT a significant amount of variability, STOP HERE.


Q3: For each 1-unit increase in the IV, how much change in the DV can we expect? (Making a line!)

B/"Estimate": In original measure units, good for context. Tells us the SLOPE (interpreted the same as Person's r)

If B is closer to 0, the DV does not change much for each 1-unit increase in the IV.

If B is closer to 1, the DV does change much for each 1-unit increase in the IV.

If B is negative, the DV will DECREASE for every 1-point increase in the IV.

If B is positive, the DV will INCREASE for every 1-point increase in the IV.

"For every 1 point increase in the IV, we can expect a ____ point increase in the DV"

β/"Stand. Estimate": In units of standard deviation, good for comparison and reporting. This variable does not matter if there is only one predictor variable (IV).

If β is lower, it is a less important predictor with less impact.

   If β is higher, it is a more important predictor, with more impact.

SE: The amount of variance, or how far away on average each point is from your line of best fit. 

If the SE is small, the points on average are close to the line of best fit.

If the SE is large, the points on average are farther to the line of best fit.

t-value: The significance of the slope

If the t-value is higher, this would indicate greater confidence in the coefficient as a predictor

If the t-value is closer to 0, this would indicate low reliability in the coefficient as a predictor, closer to a flat line.

p-value: The probability you detect a meaningful relationship/difference when there is none. Looking for a small value (p < 0.5).

If p < 0.5, reject the null hypothesis. The IV IS a significant contributor to the model, and makes up most of

If p > 0.5, accept the null hypothesis. The IV is NOT a significant contributor to the model.


Now, we can create a line using all of this output using Y=a+ bx (Y=DV, A=Y Intercept Estimate, B=Slope, X=IV)

Note: MULTIPLE Regression uses a different line equation of Y=a+bx+ bx


APA Format:

Appropriate data visualization: Scatterplot (with line of best fit), line graph


Sample table: https://apastyle.apa.org/style-grammar-guidelines/tables-figures/sample-tables#regression


Sample write-up:

Simple linear regression analysis was conducted to evaluate the extent to which [independent/predictor variable] could predict [dependent/outcome/criterion variable].

A significant regression [was/was not] (1) found (F([df for regression] (2),[df for residual] (3)) = [F value] (4), p = [p value] (5)). The R2 was [R2 value] (6), indicating that [independent variable] explained approximately [R2 multiplied by 100]% (7) of the variance in [dependent variable].  The regression equation was:

[dependent variable] = [constant] (8) + [slope of the regression line](9)([independent variable]).

That is, for each one [independent variable unit of measurement] increase in [independent variable], the predicted [dependent variable] [increased/decreased] (10) by approximately [slope of regression line] (11) [dependent variable unit of measurement].

Confidence intervals indicated that we can be 95% certain that the slope to predict [dependent variable] from [independent variable] is between [lower bound of confidence interval for independent variable] (12) and [upper bound of confidence interval for independent variable] (13).


Note: Replace the [blue text in square brackets] with information from your own simple linear regression analysis. See the red numbers below for more context on how to report and where to find them.

(1) Your simple linear regression is significant if the p-value presented in the Sig. column of your ANOVA table is less than or equal to the alpha level you selected for your regression. Select an alpha level before you conduct your regression analysis.  An alpha level of .05 is typical. Don't interpret a non-significant regression.

(2) and (3) Report the degrees of freedom (df) from the Regression and Residual rows of the ANOVA table, respectively.

(4) Report the F value from the ANOVA table to two decimal places.  Add a leading zero to your F value if it is less than 1.00 (for example, an F value of .791 would be reported as 0.79).

(5) Report the exact p-value as per the ANOVA table to two or three decimal places, and do not add a leading zero.

(6) You will find the value of R2 in the R Square column of the Model Summary table.  Reporting this value to two decimal places.  If your sample size is small, consider reporting Adjusted R Square.

(7) To calculate the percentage of variance in the dependent variable explained by the independent variable multiply R2 (in (6) above) by 100.

(8) Report the Constant from the Intercept row from the Estimate column to two decimal places.

(9) and (11) You will find the slope of the regression line in the Estimate column in the row for your independent variable of the Coefficients table.  Report this value to two decimal places.

(10) When the independent variable increases, the predicted value of the dependent value will increase is the slope is +, and decrease if the slope is negative.

(12) and (13) Report the Lower and Upper Bounds presented in the row for the independent variable under 95.0% Confidence Interval for B in the Coefficients table. Report these values to two decimal places.

Source: https://ezspss.com/report-simple-linear-regression-from-spss-in-apa-style/

Jamovi tutorial video for (linear) regression:

Linear Regression Tutorial.mp4