Linear regression is frequently the "go-to" most practical approach to modelling the relationship between two or more variables. When we are convinced that a cause and effect can be observed between two variables then we designate a dependent variable and link this to one or other explanatory variables (or independent variables). Below, for example, we believe that the number of students per teacher influences test scores. The case of one explanatory variable is referred to as simple linear regression. For more than one explanatory variable, the process is referred to as multiple linear regression. Here using Excel and R, I set up a simple regression. Later, I will use an OLS model to measure credit risk and also to interpolate asset volatility. Please make reference to https://www.econometrics-with-r.org/4-lrwor.html . Introduction to Econometrics using R is the one of the more practical combinations of worked examples and econometric analysis which I strongly recommend. Please use link to access spreadsheet workings. The text is free but more importantly exploits interactive learning that blends R code with examples provided in the celebrated Stock & Watson (2015). Set up as an Empirical companion - the interactive script permits a reproducible research report style and enables students not only to learn how results of case studies can be replicated with R but also strengthens their ability in using the newly acquired skills in other empirical applications. Below in the video clips, I break apart some basic examples in Excel and then follow Christoph Hanck, Martin Arnold, Alexander Gerber and Martin Schmelzer for the rest.
The following R code was used to output Ordinary Least Squares parameter estimates. Compare the confidence intervals with spreadsheet.
# Chapter 5 Introduction to Econometrics with R but with
# simple regression
# Numbers changed for hypothesis testing
# https://www.econometrics-with-r.org/4-lrwor.html
# Create sample data
STR <- c(15, 17, 19, 20, 22, 23.5, 25)
Testscore <- c(680, 640, 670, 660, 630, 660, 635)
# Print out sample data
STR
Testscore
# create a scatterplot of the data
plot(Testscore ~ STR)
# estimate the model and assign the result to linear_model
linear_model <- lm(Testscore ~ STR)
# print the standard output of the estimated lm object to the console
linear_model
summary(linear_model)
plot(Testscore ~ STR,
main = "Scatterplot of Testscore and STR",
xlab = "STR (X)",
ylab = "Test Testscore (Y)",
xlim = c(10, 30),
ylim = c(600, 720))
# add the regression line
abline(linear_model)
summary(linear_model)$coef
residuals(linear_model)
linear_model$df.residual
#β1 ∼ t5 p-value for a two-sided significance
2 * pt((-2.968015/1.965646), df = 5 )
# Not Close to Normal
2 * pnorm(-2.968015/1.965646)
# compute 95% confidence interval for coefficients in 'linear_model'
confint(linear_model)
# compute 95% confidence interval for coefficients in 'linear_model' by hand
lm_summ <- summary(linear_model)
c("lower" = lm_summ$coef[2,1] - qt(0.975, df = lm_summ$df[2]) * lm_summ$coef[2, 2],
"upper" = lm_summ$coef[2,1] + qt(0.975, df = lm_summ$df[2]) * lm_summ$coef[2, 2])
The R code below provides for a manual working out of key metrics including SSR, TSS and ESS
# Chapter 4 Introduction to Econometrics with R
# https://www.econometrics-with-r.org/4-lrwor.html
# Create sample data
STR <- c(15, 17, 19, 20, 22, 23.5, 25)
Testscore <- c(680, 640, 670, 660, 630, 660, 635)
# Print out sample data
STR
Testscore
# create a scatterplot of the data
plot(Testscore ~ STR)
# estimate the model and assign the result to linear_model
linear_model <- lm(Testscore ~ STR)
# print the standard output of the estimated lm object to the console
linear_model
summary(linear_model)
plot(Testscore ~ STR,
main = "Scatterplot of Testscore and STR",
xlab = "STR (X)",
ylab = "Test Testscore (Y)",
xlim = c(10, 30),
ylim = c(600, 720))
# add the regression line
abline(linear_model)
summary(linear_model)$coef
residuals(linear_model)
anova(linear_model)
# Manual Estimation
# define the components
n <- 7 # number of observations (rows)
k <- 1 # number of regressors
y_mean <- mean(Testscore) # mean of avg. Testscores
SSR <- sum(residuals(linear_model)^2) # sum of squared residuals
TSS <- sum((Testscore - y_mean )^2) # total sum of squares
ESS <- sum((fitted(linear_model) - y_mean)^2) # explained sum of squares
# compute the measures
SER <- sqrt(1/(n-k-1) * SSR) # standard error of the regression
Rsq <- 1 - (SSR / TSS) # R^2
adj_Rsq <- 1 - (n-1)/(n-k-1) * SSR/TSS # adj. R^2
# print the measures to the console
c("SER" = SER, "R2" = Rsq, "Adj.R2" = adj_Rsq)
ANOVA is important for considering the explanatory power of any model. Here, I take a slightly closer look:
t values are used to gauge the statistical significance of individual explanatory variables. The F-test lets you compare two competing regression models in their capacity to “explain” the variance in the dependent variable. The F-test is used mainly employed in ANOVA to assess the overall capacity of the model generated from the regression analysis.
Please see here for Econometrics Academy.