Linear regression is frequently the "go-to" most practical approach to modelling the relationship between two or more variables. When we are convinced that a cause and effect can be observed between two variables then we designate a dependent variable and link this to one or other explanatory variables (or independent variables). Below, for example, we believe that the number of students per teacher influences test scores. The case of one explanatory variable is referred to as simple linear regression. For more than one explanatory variable, the process is referred to as multiple linear regression. Here using Excel and R, I set up a simple regression. Later, I will use an OLS model to measure credit risk and also to interpolate asset volatility. Please make reference to https://www.econometrics-with-r.org/4-lrwor.html . Introduction to Econometrics using R is the one of the more practical combinations of worked examples and econometric analysis which I strongly recommend. Please use link to access spreadsheet workings. The text is free but more importantly exploits interactive learning that blends R code with examples provided in the celebrated Stock & Watson (2015). Set up as an Empirical companion - the interactive script permits a reproducible research report style and enables students not only to learn how results of case studies can be replicated with R but also strengthens their ability in using the newly acquired skills in other empirical applications. Below in the video clips, I break apart some basic examples in Excel and then follow Christoph Hanck, Martin Arnold, Alexander Gerber and Martin Schmelzer for the rest.
The following R code was used to output Ordinary Least Squares parameter estimates. Compare the confidence intervals with spreadsheet.
# Chapter 5 Introduction to Econometrics with R but with# simple regression# Numbers changed for hypothesis testing# https://www.econometrics-with-r.org/4-lrwor.html# Create sample dataSTR <- c(15, 17, 19, 20, 22, 23.5, 25)Testscore <- c(680, 640, 670, 660, 630, 660, 635)# Print out sample dataSTRTestscore# create a scatterplot of the dataplot(Testscore ~ STR)# estimate the model and assign the result to linear_modellinear_model <- lm(Testscore ~ STR)# print the standard output of the estimated lm object to the consolelinear_modelsummary(linear_model)plot(Testscore ~ STR, main = "Scatterplot of Testscore and STR", xlab = "STR (X)", ylab = "Test Testscore (Y)", xlim = c(10, 30), ylim = c(600, 720))# add the regression lineabline(linear_model)summary(linear_model)$coefresiduals(linear_model)linear_model$df.residual#β1 ∼ t5 p-value for a two-sided significance2 * pt((-2.968015/1.965646), df = 5 )# Not Close to Normal2 * pnorm(-2.968015/1.965646)# compute 95% confidence interval for coefficients in 'linear_model'confint(linear_model)# compute 95% confidence interval for coefficients in 'linear_model' by handlm_summ <- summary(linear_model)c("lower" = lm_summ$coef[2,1] - qt(0.975, df = lm_summ$df[2]) * lm_summ$coef[2, 2], "upper" = lm_summ$coef[2,1] + qt(0.975, df = lm_summ$df[2]) * lm_summ$coef[2, 2])The R code below provides for a manual working out of key metrics including SSR, TSS and ESS
# Chapter 4 Introduction to Econometrics with R# https://www.econometrics-with-r.org/4-lrwor.html# Create sample dataSTR <- c(15, 17, 19, 20, 22, 23.5, 25)Testscore <- c(680, 640, 670, 660, 630, 660, 635)# Print out sample dataSTRTestscore# create a scatterplot of the dataplot(Testscore ~ STR)# estimate the model and assign the result to linear_modellinear_model <- lm(Testscore ~ STR)# print the standard output of the estimated lm object to the consolelinear_modelsummary(linear_model)plot(Testscore ~ STR, main = "Scatterplot of Testscore and STR", xlab = "STR (X)", ylab = "Test Testscore (Y)", xlim = c(10, 30), ylim = c(600, 720))# add the regression lineabline(linear_model)summary(linear_model)$coefresiduals(linear_model)anova(linear_model)# Manual Estimation# define the componentsn <- 7 # number of observations (rows)k <- 1 # number of regressorsy_mean <- mean(Testscore) # mean of avg. TestscoresSSR <- sum(residuals(linear_model)^2) # sum of squared residualsTSS <- sum((Testscore - y_mean )^2) # total sum of squaresESS <- sum((fitted(linear_model) - y_mean)^2) # explained sum of squares# compute the measuresSER <- sqrt(1/(n-k-1) * SSR) # standard error of the regressionRsq <- 1 - (SSR / TSS) # R^2adj_Rsq <- 1 - (n-1)/(n-k-1) * SSR/TSS # adj. R^2# print the measures to the consolec("SER" = SER, "R2" = Rsq, "Adj.R2" = adj_Rsq)ANOVA is important for considering the explanatory power of any model. Here, I take a slightly closer look:
t values are used to gauge the statistical significance of individual explanatory variables. The F-test lets you compare two competing regression models in their capacity to “explain” the variance in the dependent variable. The F-test is used mainly employed in ANOVA to assess the overall capacity of the model generated from the regression analysis.