Problems - Ch3

  1. Create the following simulated data.

    1. x ~ a binary variable, where Prob(x=1) = 0.2 (20% are 1s)

    2. w ~ U[0,1] + d*x, where d = 3 (uniformly distributed between 0 and 1 + the effect of x)

    3. z ~ U[0,1] + f*x, where f = 2

    4. e ~ N(0,1) (normally distributed with a variance of 1)

    5. y = a + b*x + c*w + g*z + e, where a = 1, b = 0, c = -3, g = -2

    6. N = 50 (50 data points)

    7. Run ordinary least squares on the relationship between y and x (y as a function of x). Present the estimates of b. Discuss why it is or is not close to the true value.

    8. Draw the causal diagrams representing this data generating process.

    9. Run ordinary least squares on the relationship between y and all of x, w and z. Present the estimates of b. Discuss why the it is or is not close to the true value.

    10. Can you show that b must be equal to zero? Show your test.

  2. Create the following simulated data.

    1. z ~ U[0,1] (uniformly distributed between 0 and 1)

    2. u ~ N(0,2) (normally distributed with a variance of 2)

    3. e ~ N(0,1)

    4. x = c + d*u + f*z + e, where c = -4, d = -2, f = 2

    5. y = a + b*x + u, where a = 3, b = -2.

    6. N = 1000 (1000 data points)

    7. Draw a DAG for this data.

    8. Run ordinary least squares on the relationship between y and x (y as a function of x). Present estimates for a and b. Discuss why they are or are not close to the true values.

    9. Run OLS on the relationship between x and z, present estimates for c and f.

    10. Run OLS on the relationship between y and z. How should the coefficient estimate on z be interpreted?

    11. Using the results in (h) and (i) to determine the IV estimate of b. Discuss why this estimate is or is not close to the true value of b.

  3. Using matrix algebra, derive the IV estimate.

  4. Create the following simulated data.

    1. x ~ a binary variable, where Prob(x=1) = 0.2 (20% are 1s)

w ~ U[0,1] + d*x, where d = 3 (uniformly distributed between 0 and 1 + the effect of x)

z ~ U[0,1] + f*x, where f = 2

e ~ N(0,1) (normally distributed with a variance of 1)

y = a + b*x + c*w + g*z + e, where a = 1, b = 0, c = -3, g = -2

N = 1000

    1. Calculate the OLS estimate based on the matrix algebra in R.

    2. Calculate the IV estimate based on the matrix algebra in R.

    3. Compare your answers in b and c to your answers Q2.

  1. Download the data for Using Geographic Variation in College Proximity to Estimate Returns to Schooling by David Card (http://www.nber.org/papers/w4483). The data is available here: Data (Google Sheets)

    1. Replicate Table 3 A (the top part, to the extent you can with the data).

    2. How should we interpret the estimate for education in row (2) (0.132)?

  2. Create a two-sage IV estimator using matrix algebra and using the bootstrap to create a measure of uncertainty around the estimates. Compare the results the estimator used in (4) and (5).