Complete Subset Averaging with Many Instruments

with Youngki Shin (McMaster U), The Econometrics Journal (2021) 24, 290-314.

We propose a two-stage least squares (2SLS) estimator whose first stage is the equal-weight average over a complete subset with k instruments among K available, which we call the complete subset averaging (CSA) 2SLS. The approximate mean squared error (MSE) is derived as a function of the subset size k by the Nagar (1959) expansion. The subset size is chosen by minimizing the sample counterpart of the approximate MSE. We show that this method achieves the asymptotic optimality among the class of estimators with different subset sizes. To deal with averaging over a growing set of irrelevant instruments, we generalize the approximate MSE to find that the optimal k is larger than otherwise. An extensive simulation experiment shows that the CSA-2SLS estimator outperforms the alternative estimators when instruments are correlated. As an empirical illustration, we estimate the logistic demand function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate is better supported by economic theory than the alternative estimates.


  • pdf, arXiv

  • STATA code: csa2sls [instructions]

  • Previously circulated under the title of "Optimal Estimation with Complete Subsets of Instruments"

Figure 1 from the paper, which shows the correlation between the instruments. Our CSA-2SLS method works best for correlated instruments like (b) or (c).