Biostatistics
Linear Regression Analysis
Linear Regression Analysis
The table below depicts the data that we have used in our lab session. It describes data collected from 5 different scenarios (Steppe, Dry Mediterranean, Humid Mediterranean, Temperate and Boreal), including 4 independent variables (Mean Annual Precipitation, Tree Species Richness, Fake Variable, and Tree Carbon Stocks).
We are testing two hypotheses here: (1) Precipitation-Carbon Hypothesis; and (2) Diversity-Carbon Hypothesis. They each hypothesized a potential relationship between one independent variable (Mean Annual Precipitation, or Tree Species Richness) with Tree Carbon Stocks.
Let's first try to create scatter plots between the pair of these variables. The X variable is called Predictor variable and the Y variable is called Response. Let's see if we could use a simple linear regression analysis to statistically sound to draw a relationship between the two plotted variables. I generated the scatter plots with fitted line, and performed the linear regression analysis using Excel. Related R square values and the p-values are shown below the corresponding graphs.
(We have shown the procedures how to use Excel to do scatter plot, fit a linear line and perform linear regression analysis. You may refer to the handout for detailed procedures.)
R² = 0.69, p-value = 0.08
R² = 0.92, p-value = 0.01
R² = 0.88, p-value = 0.02
On the left is a demonstration of the Linear Regression Analysis result calculated from Excel.
Look for the parameters highlighted that we would use in later analysis:
R Square (R²)
Coefficient
p-value
While R² (coefficient of determination) and p-values are both indicators used in the context of regression analysis, they measure different things, and one does not dictate the value of the other. It's possible to have a high R² with an insignificant p-value and vice versa.
R² Value:
The R² value measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). A high R² suggests that the model explains a large portion of the variance in the dependent variable.
P-value:
P-values are associated with hypothesis testing. In the context of regression, the p-value for each coefficient tests the null hypothesis that the coefficient is equal to zero (no effect). A small p-value (<0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to the model because changes in the predictor's value are related to changes in the response variable.
Now, let's consider a few scenarios:
(A) High R² and p-value < 0.05:
This is often what researchers hope for. It indicates that the model explains a significant portion of the variance in the dependent variable, and the predictors are statistically significant.
(B) High R² and p-value > 0.05:
This might happen when there's multicollinearity in the data. The model might explain a large portion of the variance (high R²), but individual predictors might not be statistically significant.
(C) Low R² and p-value < 0.05:
The model doesn't explain a lot of variance in the dependent variable, but the predictors are statistically significant. This might mean that although the predictors are related to the response, there's a lot of unexplained variability.
(D) Low R² and p-value > 0.05:
The model doesn't explain much of the variance, and predictors aren't statistically significant. This suggests that the model may not be very useful.
I enjoyed reading the book listed on the left.