1. "High Dimensional Binary Choice Model with Unknown Heteroskedasticity or Instrumental Variables," 2025, Journal of Econometrics, with Thomas T. Yang.
Working paper version: [arXiv]
Abstract: This paper proposes a new method for estimating high-dimensional binary choice models. We consider a semiparametric model that places no distributional assumptions on the error term, allows for heteroskedastic errors, and permits endogenous regressors. Our approaches extend the special regressor estimator originally proposed by Lewbel (2000). This estimator becomes impractical in high-dimensional settings due to the curse of dimensionality in conditional density estimation. To overcome this challenge, we introduce an innovative data-driven dimension reduction method for nonparametric kernel estimators, which constitutes the main contribution of this work. The method combines distance-covariance-based screening with cross-validation (CV), enabling special regressor estimation in high dimensions. Using this new feasible conditional density estimator, we address variable and moment (instrumental variable) selection problems for these models. We apply penalized least squares (LS) and generalized method of moments (GMM) estimators with an Lasso penalty. A comprehensive analysis of the oracle and asymptotic properties of these estimators is provided. Finally, through Monte Carlo simulations and an empirical study of the migration intentions of rural Chinese residents, we demonstrate the effectiveness of our proposed methods in finite samples.
2. "Revisiting Panel Data Discrete Choice Models with Lagged Dependent Variables," 2025, Journal of Business & Economic Statistics, with Christopher Dobronyi and Thomas T. Yang.
Working paper version: [arXiv]
Abstract: This paper revisits the identification and estimation of a class of semiparametric (distribution-free) panel data binary choice models with lagged dependent variables, exogenous covariates, and entity-fixed effects. We provide a novel identification strategy based on an “identification at infinity” argument. In contrast with the celebrated Honoré and Kyriazidou (2000), our method permits time trends of any form and does not suffer from the “curse of dimensionality”. We propose an easily implementable conditional maximum score estimator. The asymptotic properties of the proposed estimator are fully characterized. A small-scale Monte Carlo study demonstrates that our approach performs satisfactorily in finite samples. We illustrate the usefulness of our method by presenting an empirical application to enrollment in private hospital insurance using the Household, Income and Labor Dynamics in Australia (HILDA) Survey data.
3. "Semiparametric Estimation of Dynamic Binary Choice Panel Data Models," 2025, Econometric Theory, with Thomas T. Yang.
Working paper version: [arXiv]
Abstract: We propose a new approach to the semiparametric analysis of panel data binary choice models with fixed effects and dynamics (lagged dependent variables). The model under consideration has the same random utility framework as in Honoré and Kyriazidou (2000). We demonstrate that, with additional serial dependence conditions on the process of deterministic utility and tail restrictions on the error distribution, the (point) identification of the model can proceed in two steps, and requires matching only the value of an index function of explanatory variables over time, rather than the value of each explanatory variable. Our identification method motivates an easily implementable, two-step maximum score (2SMS) procedure, producing estimators whose rates of convergence, unlike those of Honoré and Kyriazidou (2000), are independent of the model dimension. We then analyze the asymptotic properties of the 2SMS procedure and propose bootstrap-based distributional approximations for inference. Monte Carlo simulations indicate that our procedure performs satisfactorily in finite samples.
4. "Inference on Semiparametric Multinomial Response Models," 2021, Quantitative Economics, with Shakeeb Khan and Elie Tamer.
Online appendix: [online appendix]
Abstract: We explore inference on regression coefficients in semiparametric multinomial response models. We consider cross-sectional, static, and dynamic panel settings, focusing throughout on inference under sufficient conditions for point identification. The approach to identification uses a matching insight throughout all three models, coupled with variation in regressors: with cross-section data, we match across individuals, while with panel data, we match within individuals over time. Across models, we relax the Independence of Irrelevant Alternatives assumption and allow arbitrary correlation among the unobservables that determine the utilities of the various alternatives. For the cross-sectional model, estimation is based on a localized rank objective function, analogous to that used in Abrevaya, Hausman, and Khan (2010), and presents a generalization of existing approaches. In panel data settings, rates of convergence are shown to exhibit a curse of dimensionality with respect to the number of alternatives. The results for the dynamic panel data model generalize the work of Honoré and Kyriazidou (2000) to cover the semiparametric multinomial case. A simulation study establishes the adequate finite-sample properties of our new procedures. We apply our estimators to a scanner panel data set.
5. "Semiparametric Identification and Estimation of Discrete Choice Models for Bundles," 2020, Economics Letters, with Thomas T. Yang and Hanghui Zhang.
Online appendix: [online appendix]
Abstract: We study (point) identification of preference coefficients in semiparametric discrete choice models for bundles. The approach to identification uses an “identification at infinity” (Chamberlain, 1986) insight, combined with median independence restrictions on unobservables. We propose two-stage maximum-score (MS) estimators and establish their consistency. Monte Carlo evidence demonstrates that our approach performs satisfactorily in finite samples.
"Semiparametric Discrete Choice Models for Bundles," with Thomas T. Yang [arXiv].
"Dimension Reduction for Conditional Density Estimation with Applications to High-Dimensional Causal Inference," with Jianhua Mei and Thomas T. Yang [arXiv].
"Are Different Risky Behaviours Complements or Substitutes? Evidence from Smoking and Drinking," with Tianyi Li and KK Tang [SSRN].
"Uncovering Sparse Financial Networks with Information Criteria," with Thomas T. Yang and Wenying Yao [arXiv, SSRN].
"A Simple Approach for Discovering (Exact) Sparsity Structure in Nonparametric Regression," with Jianhua Mei and Thomas T. Yang [arXiv].
"Least Absolute Deviation Estimation of Distribution-Free Bundle Choice Models," with Thomas T. Yang [SSRN, online appendix].
"Does Social Capital Attenuate Negative Mental Health Shocks? Evidence From a Natural Experiment," with Jonas Fooken, Xinyang Li, and KK Tang [coming soon].
"Deep Neural Network Estimation of Semiparametric Multi-Index-Single-Crossing Models," with Yiran Xie, Thomas T. Yang, Nan Ye.
"Dimension-Reduced Doubly Robust DiD Estimation in High-Dimensional Settings," with Xinyang Li and KK Tang.
"Dependence-Based Dimension Reduction for Large Sample Sizes," with Jianhua Mei and Thomas T. Yang.
"A General Testing Framework for Valuation Distributions in First-Price Auctions," with Tong Li and Thomas T. Yang.