Assistant professor, Department of Economics, University of California, Irvine

email: yingying.lee@uci.edu

3151 Social Science Plaza, University of California Irvine, Irvine, CA 92697-5100

*Working paper:*

**Efficient propensity score regression estimators of multivalued treatment effects for the treated**(2017). (pdf) Revision Requested at the*Journal of Econometrics*

Matching is a widely-used program evaluation estimation method when treatment is assigned at random conditional on observable characteristics. When a multivalued treatment takes on more than two values, valid causal comparisons for a subpopulation who is treated a particular treatment level are based on two propensity scores — one for the treated level and one for the counterfactual level. The main contribution of this paper is propensity score regression estimators for a class of treatment effects for the treated that achieve the semiparametric efficiency bounds under the cases when the propensity scores are unknown and when they are known. We derive the large sample distribution that reveals how first step estimation of the propensity score as generated regressors affects asymptotic efficiency. We contribute to the binary treatment literature by a new propensity score regression estimator for the average/quantile treatment effect for the treated: the proposed efficient estimator matches on a normalized propensity score that is a combination of the true propensity score and its nonparametric estimate. Moreover, we formally show that the semiparametric efficiency bound is reduced by knowledge of the propensity scores for the treated levels, but is not affected by knowledge of the propensity score for the counterfactual level. A Monte-Carlo experiment supports our theoretical findings.

**Applied Welfare Analysis for Discrete Choice with Interval-data on Income**(Sep 2016), with Debopam Bhattacharya. (pdf) Revision Requested at the*Journal of Econometrics*- This paper concerns empirical measurement of consumer welfare under interval-reported income. Bhattacharya (2015) has recently shown that for discrete choice, welfare distributions resulting from a hypothetical price-change can be expressed as closed-form transformations of choice probabilities. However, when income is only interval-observed, as is the case in many household and marketing surveys, the choice probabilities, and hence welfare distributions are not point-identified. We derive bounds for the average compensating and equivalent variation in such scenarios under the assumption of a normal good, and using shape restrictions from economic theory. A key finding of independent interest is a set of Slutsky conditions which are linear in average demand, unlike those for continuous choice. Our approach to welfare analysis is based on a best parametric approximation to choice probabilities, which facilitates imposition of these Slutsky conditions, and leads to computationally simple inference for the partially identified features of welfare. In particular, our estimand is shown to be directionally differentiable, so that recently developed bootstrap methods can be applied for inference. The usefulness of these methods extends to more general settings where a class of set identified functions are subject to linear inequality restrictions, and one wishes to conduct inference on functionals thereof. We illustrate our theoretical results using a simulation exercise based on a real dataset where actual income is observed. We artificially introduce interval-censoring of income, calculate bounds for the average welfare effects of a subsidy using our methods, and find that they perform favorably in comparison to the true estimates obtained by using the actual income values.

**Partial Mean Processes with Generated Regressors: Continuous Treatment Effects****and Nonseparable Models**(2015). (pdf)

Partial mean processes with generated regressors arise in several important econometric problems, such as the distribution of potential outcomes with continuous treatments and the quantile structural function in a nonseparable triangular model. This paper proposes a nonparametric estimator for the partial mean process, where the second step consists of a kernel regression on regressors that are estimated in the first step. The main contribution is a uniform expansion that characterizes in detail how the estimation error associated with the generated regressor affects the limiting distribution of the marginal integration estimator. The general results are illustrated with two examples: the generalized propensity score for a continuous treatment (Hirano and Imbens, 2004) and control variables in triangular models (Newey, Powell, and Vella, 1999; Imbens and Newey, 2009).

*Published papers:*

**Interpretation and Semiparametric Efficiency in Quantile Regression under Misspecification****,**(pdf)*Econometrics*(2016), 4(1) 2

Allowing for misspecification in the linear conditional quantile function, this paper provides a new interpretation and the semiparametric efficiency bound for the quantile regression parameter in Koenker and Bassett (1978). The first result on interpretation shows that under a mean-squared loss function, the probability limit of the Koenker-Bassett estimator minimizes a weighted distribution approximation error, defined as the deviation of the conditional distribution function, evaluated at the linear quantile approximation, from the quantile level. The second result implies that the Koenker-Bassett estimator semiparametrically efficiently estimates the quantile regression parameter that produces parsimonious descriptive statistics for the conditional distribution. Therefore quantile regression shares the attractive features of ordinary least squares — interpretability and semiparametric efficiency under misspecification.

*Work in progress:*

**Nonparametric Weighted Average Quantile Derivative**(2013),*draft**available upon request.*

We estimate the weighted Average Quantile Derivative (AQD) that is the expected value of the partial derivative of the conditional quantile function (CQF) weighted by a generic function of the covariates. We consider two weighting functions: (1) a known funtion chosen by researchers; (2) an estimated density function of the covariates, parallel to the density-weighted average mean derivative in Powell et al. (1989). The proposed estimator achieves root-n consistency and asymptotic normality by a first-step nonparametric kernel estimation for the unknown functions and a second-step sample analogue of the product-moment representation of the AQD. Including a stochastic trimming function for the denominator problem, the estimator is consistent for the weighted AQD defined on the entire support of the covariates. We provide a new Bahadur-type linear representation of the kernel-based CQF estimator uniformly over the covariates in an expanding compact set and over the quantile levels, which can be of independent interest. The unweighted AQD gives a succinct summary statistic for the average marginal response of the covariates on the CQF and defines a nonparametric quantile regression coefficient. For the semiparametric single-index and partial linear models, the weighted AQD identifies the coefficients up to scale. For the nonparametric nonseparable structural model, the AQD conveys a causal interpretation of the average structural effect of an endogenous variable under conditional independence assumption.

**Evaluating the Effects of Lengths of Participation in the Workforce Investment Act Adult Program via Decomposition****Analysis**, with Wallice Ao and Sebastian Calonico,*draft**available upon request.*

This paper studies how different lengths of participation in the Workforce Investment Act (WIA) Adult program affect participants’ post-program wage outcomes. We explore the role of wage structure and participants’ characteristics distribution associated with different lengths of participation in attributing to the difference in observed wage distributions. To do so, we propose an efficient propensity score weighting estimator that decomposes the differences in wage distributions for participants of different lengths of program participation into (1) wage-structure effect: arising due to the different wage structures associated with different lengths of participation, and (2) composition effect: aris- ing due to different characteristics distributions for participants of different lengths of participation. These counterfactual effects reveal causal effects under the unconfoundedness assumption, such as treatment effects for the treated, where the multi-valued treatment variable is the length of participation. Our estimation results show that the heterogeneity in lengths of participation is an important dimension to evaluate the WIA Adult program and other social programs in which participation varies. The results of this paper, both theoretical and empirical, provide rigorous assessment of intervention programs and relevant suggestions to improve the performance and cost-effectiveness of these programs.