Module 17

Multiple regression I

Introduction

  • Simple linear regression allows us to analyze how an outcome variable depends on one predictor variable.

  • However, an outcome variable often depends on more than just one factor. For example, the likability of a person may depend on appearance, personality, talent, etc. Then we will need to employ multiple regression.

  • By adding predictor(s) one by one, in separate steps, we can also observe (1) the incremental predictive power of the newly added new variable(s) and (2) how the newly added variable(s) influence the predictive power of the

1. Multiple Regression

1.1 What is multiple regression?

Multiple regression allows us to analyze the relationship between one dependent variable (outcome) and two or more independent variables (predictor variables).

In general, a regression equation looks like the following:

where

  • y-hat = estimated dependent variable,

  • b1, b2, ... = regression coefficients,

  • x1, x2, ... = independent variables,

  • b0 = constant (i.e., intercept).

The regression equation is often referred to as the regression "model", because it illustrates how an outcome can be explained by multiple factors.

Multiple regression allows us to analyze the relationship between the outcome and the predictors at two levels:

  1. at the overall model level: we can evaluate how good the entire model predicts the dependent variable

  2. at the predictor level: we can evaluate how good each independent variable predicts the dependent variable

1.2 Example 1: Multiple regression on GPA

Many university students are interested in knowing what actually influence their GPA. Some people to a great extent argue that perceived intelligence may be a predictor. Some others think that sleep, hours they spend on social network sites may play an important role in learning, and hence, affects GPA.

For instance, we want to know how these factors are related to GPA, and to what extent they can predict GPA. So we need to test the relationships between these variables and GPA, and how well these variables can predict GPA, using the “Linear Regression” under “Regression” in jamovi.

  • Select "GPA" as Dependent Variable, and "PerInt", "Sleep", and "SNS" as Covariates

  • For the Assumption Checks, check Collinearity statistics, (and Normality test, in optional)

  • For the Overall Model Test, check F test, and add Adjusted R-square as Fit Measures

  • For the predictor level, click Model Coefficients, and check Confidence interval for the (Coefficient) Estimate, and Standardardized estimate

To interpret the result:

  • R = multiple coefficient of correlation

  • R-squared = coefficient of multiple determination

  • Adjusted R-squared = R-squared that adjusted to the number of predictor

  • F-value & p-value = overall model significance test

  • df1 = degree of freedom (regression) = k = number of predictors (3)

  • df2 = degree of freedom (error) = n - k - 1 = number of predictor (1000-3-1=996)

Regression model equation:

  • Predictor = constant (intercept) + variables

  • Estimate = regression coefficients

  • SE = standard error of the estimate

  • 95% CI = 95% confidence interval of the estimate

  • t-value & p-value = predictor significance test

  • Stand. Estimate = standardized regression coefficients (beta, range between -1 to 1)

Conclusion/ Interpretation (APA format):

A multiple regression analysis was conducted to predict GPA from perceived intelligence (PerInt), hours of sleep, and hours they spend on social network sites (SNS). These variables statistically significantly predicted GPA , F(3, 996) = 37.98, p < .001, R-squared = .10, indicating that 10% of the variance explained. Both perceived intelligence and SNS significantly predicted GPA, PerInt: β = .27, p < .001; SNS: β = -.12, p < .001. The hour of sleep was not significant, β = -.03, p = .41.

Regression model equation:

The estimated GPA = 3.128 + 0.004*PerInt + -0.007*Sleep + -0.019*SNS

1.3 Effect size interpretation

Same as other statistical tests, we can measure the effect size in a whole-model regression analysis. One measure of the effect size for the whole-model regression analysis is the R2. We measure the magnitude of effect by calculating how much of the variance in the dependent variable can be explained by the predictor(s). The statistical term we use is the coefficient of determination, denoted as R-squared (R2). Theoretically, the range of R2 is from zero to one (0 - 1). E.g., if R2 = .21, it means that 21% of the variance in the DV can be explained by the predictor(s), and 79% of the variance in the DV is attributed to other factors that are not related to the predictor(s).

Below is a table for interpreting the size of R2 in multiple regression analyses in general.

1.4 Linear-independence assumption (check for multi-collinearity)

There are two or more predictors in a multiple regression analysis. And predictor variables are often correlated to each other.

One of the mathematical assumptions underlying multiple regression is that predictor variables are linearly independent. In other words, predictors are assumed to be independent of one another. Suppose one predictor, A, is linearly dependent of some other predictors, e.g., B and C. The consequence is that B and C together can predict a significant proportion of variability in A. In this case, we don't need A in the regression model, because B and C can take up A's job in predicting the outcome. This way, A becomes redundant and should not be included in the regression model.

In order to check for the linear independence assumption, we can use the tolerance and variance inflation factor (VIF).

One commonly-used criterion for VIF is as follows. If the VIF of a predictor is 10 or above and/or the tolerance is 0.10 or less, then that predictor can be linearly dependent on some other predictor(s), and should be removed. Some researchers may use a more sensitive criterion (e.g., 5 or above for VIF and .20 or less for tolerance).

For example, we try to use perceived intelligence, sleeping hours and self-control to predict GPA. Specifically, we want to check whether these three predictors demonstrate any multi-collinearity. Below is the tutorial on how to check the VIF and tolerance in jamovi.

Conclusion/ interpretation (APA format):

The test for no multicollinearity assumption indicated that multicollinearity was not a concern (PerInt, Tolerance = .95, VIF = 1.05; Sleep, Tolerance = .98, VIF = 1.02; BSC, Tolerance = .93, VIF = 1.07).

2. Hierarchical regression & R-square change

2.1 What is hierarchical regression?

  • Hierarchical regression is a specific type of multiple regression. You build more than one regression model by adding new variable(s) at each step.

  • It allows model comparison. By adding predictor variables step-by-step and comparing the current model with the previous one, you can observe the change of explained variability by adding a new predictor variable(s)

2.2 Example 2: Multiple hierarchical regression on procrastination

Some people proposed that personality traits of conscientiousness and agreeableness may predict procrastination tendency. Some of them argue that basic self-control should be the predictor given when the effects of conscientiousness and agreeableness are considered.

To examine the incremental predictive power of basic self-control on procrastination after accounting for conscientiousness and agreeableness.

  • Select "IPS" (the composite score of procrastination) as Dependent Variable, and all the predictor variables, conscientiousness, agreeableness, basic self-control "BFM_C", "BFM_S", and "BSC" as Covariates

  • Importantly, we need to define the Block under Model Builder, by clicking + Add New Block, then adding the target variable, BSC into Block 2.

By doing this, jamovi will help you to specify two model

  • Model 1: IPS = constant + BFM_C + BFM_A

  • Model 2: IPS = constant + BFM_C + BFM_A + BSC

and perform the model comparison (models with and without BSC)

Notice that the later model will always include all the predictor variables from the previous model.

From the results, we can interpret:

  • A hierarchical regression analysis was conducted to predict procrastination tendency from conscientiousness, agreeableness, and basic self-control.

  • These variables statistically significantly predicted procrastination tendency, F(3, 996) = 37.98, p < .001, R-squared = .45, indicating that 45% of the variance explained.

  • By adding basic self-control (BSC), the model accounts for additional explained variance of 22.7% with a statistical significance, F(1,996) = 441.32, p < .001.

  • All the predictor variables were statistically significant, conscientiousness: β = -.28, p < .001; agreeableness: β = .31, p < .001; basic self-control: β = -.49, p < .001.

Regression model equation:

The estimated procrastination = 4.62 + -0.26*BFM_C + 0.25*BFM_A + -0.58*BSC

Module Exercise (4% of total course assessment)

Complete the exercise!

    • Now, if you think you're ready for the exercise, you can check your email for the link.

    • Remember to submit your answers before the deadline in order to earn the credits!