Stata Command: crhdreg

Stata Command: crhdreg.ado

This command executes double/debiased machine learning estimation of regression models and IV regression models under clustering. The cluster sampling environments accommodated by this command include the i.i.d. sampling, one-way cluster sampling, and two-way cluster sampling. The four tables below show results of estimating the demand system under i.i.d. sampling (1st table), one-way clustering by markets (2nd table), one-way clustering by products (3rd table), and two-way clustering by both markets and products (4th table).

Double/debiased machine learning under i.i.d. sampling

Double/debiased machine learning under clustering by markets

Double/debiased machine learning under clustering by products

Double/debiased machine learning under two-way clustering by markets and products

Installation:

. ssc install crhdreg

Usage:

. use "blp.dta"

. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt)

. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(market)

. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(model)

. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(market) cluster2(model)

Help:

. help crhdreg

Reference: Chiang, H.D., K. Kato, Y. Ma & Y. Sasaki (2021) Multiway Cluster Robust Double/Debiased Machine Learning. Journal of Business & Economic Statistics, 40 (3), pp. 1046-1056. Paper.

Download the manuscript and package in preparation for The Stata Journal (not submitted yet - any comments are welcomed)

Title
crhdreg -- Executes estimation of high-dimensional regressions with based on cluster-robust double/debiased machine learning.
Syntax
crhdreg depvar indepvarlist1 indepvarlist2 [if] [in] [, cluster1(varname) cluster2(varname) iv(varname) dimension(real) folds(real) resample(real) median alpha(real) tol(real) maxiter(real)]
Description
crhdreg executes estimation of high-dimensional regression and high-dimensional IV regression with one-way or two-way cluster-robust standard errors based on Chiang, Kato, Ma and Sasaki (2021). The high-dimensional regression estimation is executed by the (multiway) cluster-robust double/debiased machine learning with the high-dimensional nuisance parameters estimated via the elastic net (LASSO by default).
Options
cluster1(varname) sets the variable to construct the first cluster dimension in one- or two-way clustering. Not calling this option leads to an execution of the high-dimensional regression or the high-dimensional IV regression without clustering.
cluster2(varname) sets the variable to construct the second cluster dimension in two-way clustering. If cluster1 is called but cluster2 is not called, then the command executes the high-dimensional regression or the high-dimensional IV regression with only one way of clustering based on the variable set with the cluster1 option.
iv(varname) sets the instrumental variable when the first variable in indepvarlist1 is endogenous. Calling this option runs the high-dimensional IV regression, while not calling it leads to an execution of the high-dimensional regression.
dimension(real) sets the number of variables in indepvarlist1, the coefficients of which are to be displayed in the output table. The default value is dimension(1). It has to be a positive integer no larger than the total number of variables included in indepvarlist1 and indepvarlist2.
folds(real) sets the number K of folds for the cross fitting in the double/debiased machine learning. The default value is folds(5) under no clustering or one-way clustering. The default value is folds(3) under two-way clustering. It has to be a positive integer greater than 1.
resample(real) sets the number of resampling for a finite-sample adjustment of the double/debiased machine learning. The default value is resample(10). It has to be a positive integer.
median sets the indicator that the finite-sample adjustment uses the median of resampled estimates. Not calling this option leads to the use of the mean of reseampled estimates.
alpha(real) sets the penalty weight in the elastic net. The default value is alpha(1), and the elastic net is the LASSO (Least Absolute Shrinkage and Selection Operation). If this option is set to alpha(0), then the elastic net becomes the ridge regression. It has to be a real number between 0 and 1.
tol(real) sets the tolerance as a stopping criterion in the numerical solution to the elastic net. The default value is tol(0.000001). It has to be strictly positive real number.
maxiter(real) sets the maximum number of iterations in the numerical solution to the elastic net. The default value is maxiter(1000). It has to be a natural number.
Usage
Estimation of the partial effect of d on y controlling for 100 variables:
. crhdreg y d x1 ... x100
Cluster-robust standard error by the clustering variable g:
. crhdreg y d x1 ... x100, cluster1(g)
Two-way cluster-robust standard error by the clustering variables g1, g2:
. crhdreg y d x1 ... x100, cluster1(g1) cluster2(g2)
Instrumenting the endogenous variable d by z:
. crhdreg y d x1 ... x100, iv(z) . crhdreg y d x1 ... x100, iv(z) cluster1(g) . crhdreg y d x1 ... x100, iv(z) cluster1(g1) cluster2(g2)
Estimation of the partial effects of d1, d2, d3 on y controlling for 100 variables:
. crhdreg y d1 d2 d3 x1 ... x100, dimension(3) . crhdreg y d1 d2 d3 x1 ... x100, dimension(3) cluster1(g) . crhdreg y d1 d2 d3 x1 ... x100, dimension(3) cluster1(g1) cluster2(g2)
Instrumenting the endogenous variable d1 by z:
. crhdreg y d1 d2 d3 x1 ... x100, dimension(3) iv(z) . crhdreg y d1 d2 d3 x1 ... x100, dimension(3) iv(z) cluster1(g)
etc.
Examples
Estimation of the demand system in the differentiated product markets:
. use "blp.dta"
No clustering:
. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt)
One-way clustering by market:
. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(market)
One-way clustering by product model:
. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(model)
Two-way clustering by market and product model:
. crhdreg share logprice hpwt* air* mpd* space*, iv(sumotherhpwt) cluster1(market) cluster2(model)

Stored results
crhdreg stores the following in e():
Scalars e(N) observations e(ways) ways of clustering e(G1) cluster size in the first cluster dimension e(G2) cluster size in the second cluster dimension e(dimD) number of indepvarlist1 e(dimX) number of indepvarlist2 e(K) number of folds for the cross fitting e(alpha) penalty weight in the elastic net e(fsa_n) number of resampling for a finite-sample adjustment
Macros e(fsa_m) mean or median for a finite-sample adjustment e(iv) instrumental variable e(cluster1) clustering variable in the first cluter dimension e(cluster2) clustering variable in the second cluter dimension e(cmd) crhdreg e(properties) b V
Matrices e(b) coefficient vector e(V) variance-covariance matrix of the estimators
Functions e(sample) marks estimation sample
Reference
Chiang, H.D., K. Kato, Y. Ma, and Y. Sasaki 2021. Multiway Cluster Robust Double/Debiased Machine Learning. Journal of Business & Economic Statistics, 40 (3), pp. 1046-1056. Link to Paper.
Authors
Harold D. Chiang, University of Wisconsin, Madison, WI.
Kengo Kato, Cornell University, Ithaca, NY.
Yukun Ma, Vanderbilt University, Nashville, TN.
Yuya Sasaki, Vanderbilt University, Nashville, TN.