This assignment stems from my Econometrics class, where the objective was to compare the standard errors derived from bootstrap sampling with those obtained through a forbidden regression and a three-step procedure instrumental variable regression.
To achieve this, we utilized the 'k401ksubs' dataset, which can be accessed from the 'wooldridge' package in R.
Imagine that we want to see if there is a trade-off or complementarity between participating in different retirement saving plans (here, these plans are 401k and IRA). So, the following is a structural equation we will be estimating:
If there is a trade-off, then β1 < 0. If there is complementarity, then β1 > 0. Participation in 401k plan may be endogenous since it can be driven by the same unobserved factors that drive participation in 401k plan and since people are likey to make decisions about participating in either plan simultaneously (rather that independently). So, we need an instrument for p401k. One such instrument is 401k plan eligibility (e401k) since it is excluded, relevant (if someone is not eligible, they cannot participate) and valid (whether an employee is eligible is determined by their employer and federal regulations, and is not driven by individual-level factors).
Suppose we have the following model:
where d is a binary endogenous variable, and x is a vector of exogenous covariates. Suppose we also have an instrumental variable z (or a set of instrumental variables) that is excluded, relevant, and valid.
Since d is binary, we may be tempted to estimate a probit (or logit) model for d as a function of z and x, collect predicted values, replace the endogenous variable d in the structural model with these predicted values, and estimate the resulting model by OLS.
However, this 2-step procedure is forbidden by MIT Professor Jerry Hausman. The reason is that for all we know, the probit (or logit) model may be incorrect, and that creates a whole lot of endogeneity issues in step 2 above.
Assuming we use the 2-step procedure, we will obtain the following results:
Step 1: Estimate a probit (or logit) model for d as a function of z and x and collect predicted values (probabilities, in this case):
Steps 2 and 3: use these predicted values as instruments in a regular 2SLS estimation
of the structural model. We will use only the probit model in this case:
Despite obtaining similar results, the 3-step procedure is recommended. However, in some case, we may not obtain the correct standard errors. This is where bootstrap comes in.
All the estimates are very similar. In the case of income and age, the rounded estimates are the same.
The standard errors for Income and Age are found to be similar. However, when considering the intercept, the bootstrap method yields smaller standard errors. Conversely, for the predicted value of p401k, the bootstrap method results in larger standard errors.