Working Papers
Control Function Approach to Multivalued Endogenous Treatment Effects (Joint with Jeffrey M. Wooldridge), To be submitted to The Econometrics Journal; previously rejected by JBES,
Abstract: Using a control function (CF) approach, we study average treatment effects (ATEs) in discrete endogenous multivalued treatments complicated by correlated random coefficients (CRCs) and heterogeneous counterfactual errors, extending the results in binary treatments. Specifically, we offer a consistent CF estimator for the ATEs and state its asymptotic properties in this framework and two special cases. Moreover, conducting Monte Carlo simulations, we compare CF method with instrumental variables (IV) method. Simulation results suggest that CF method can be asymptotically up to 12% more efficient than IV method, and the asymptotic bias in IV estimates can be as high as 43%. However, when misspecication is introduced, simulations results favor IV method. For empirical illustration, we apply OLS, CF, IV, and nonparametric bound analysis to estimate how limited English proficiency (LEP) infuences wages of Hispanic workers in the USA. The data come from the 1% Public Use Microdata Sample of the 1990 US Census. Utilizing age at arrival as an instrument, both OLS and CF methods indicate that LEP imposes a signicant wage penalty (up to 79% in some CF estimates) on Hispanic community in the USA. IV method mostly produces insignicant results, and nonparametric bound analysis provides uninformative lower bounds.
Second-Order Refinements for Bias-Corrected Inference with AI-Generated Regressors (In preparation for submission to Econometric Theory)
Abstract: I study second-order inference for regressions that use AI/MLgenerated covariates as regressors in a second-step linear model. Existing bias corrections remove the leading centering distortion from first-step prediction error, yielding first-order valid confidence intervals but potentially sizable coverage error in finite samples. I derive a second-order expansion for the bias-corrected estimator and a uniform Edgeworth expansion for the studentized statistic under a regime that includes common imputed-label settings. The resulting Cornish–Fisher and recentering refinements deliver better coverage accuracy.
Variable Selection: Classical versus Machine Learning Methods (In preparation for submission to Journal of Applied Econometrics)
Abstract: This study compares classical variable selection methods commonly used in economics with machine learning-based methods, focusing on their impact on regression results, selected variables, and potential efficiency gains. Using a dataset from a previous study, I apply both methods to highlight similarities and differences. Specifically, I replicate the study from an economics paper that utilized classical variable selection techniques and reanalyze the findings of the study using machine learning-based methods. Robustness is assessed through metrics such as cross-validation, AIC, BIC, etc... The paper offers empirical insights into the relative strengths and limitations of these two approaches..
Work in Progress
An Application of Machine Learning Methods to Demand for Organic Fruits
Abstract: I estimate monthly household demand for organic fruit by household income class by using machine learning methods, e.g., LASSO, support vector machines, bagging, and random forests, and standard methods, e.g., stepwise regression, forward stagewise regression, linear regression and the conditional logit, and compare their predictive power. I use a sample of US household organic and conventional fruit purchases from 2011 through 2013, which comes from the Nielsen Corporation’s Consumer Panel Data.
Instrumental Variables Estimation for the Effectiveness of Peer Tutoring Programs with Self-selection Problem: Evidence from Economics Help Rooms and Integrative Studies Peer-Assisted Learning Sessions at Michigan State University
Abstract: I analyze the effectiveness of the MSU College of Social Sciences’ Economics Help Rooms and Peer-assisted Learning (PAL) program. Over a period of 6 semesters from Fall 2017 through Spring 2020, I collected administrative and survey data for all students enrolled in classes served by the Economics Help Rooms and PAL program. The data have information on the final grade that students received for the class; gender; minority status; class; high school GPA; college grades thus far; composite ACT score; Pell grant eligibility; whether the student is an international student, a first generation college student, and/or an intercollegiate athlete; whether and how often these students visited the help rooms and/or PAL program for their class, and some other student characteristics. In a first-stage negative binomial regression using distance to the help rooms and/or PAL program locations as an identification variable, I tackle the self-selection problem and generate fitted values for student visits. With these fitted values, in the second-stage instrumental variables ordered probit regression, I relate the use of the help rooms and/or PAL program to the final grade in the course. My empirical results suggest that the help rooms and PAL program contribute to higher grades for students.
Weak-instrument Robust Estimation and Inference in Linear Instrumental Variables Regression with Heteroskedasticity and a Single Multivalued Endogenous Explanatory Variable
Abstract: I investigate the detection of weak instruments, weak-instrument robust estimation and inference. I especially focus on the case where the errors in the reduced-form and first-stage regressions are heteroskedastic, and the linear IV regression has a single multivalued endogenous explanatory variable with weak instruments. It is also worth analyzing how nonlinear instruments (e.g., predicted probabilities from the first-stage regression) can help with a weak IV in this framework.
Control Function Method vs. Instrumental Variables: An Asymptotic Efficiency Comparison
Abstract: From asymptotic efficiency standpoint, I compare the average treatment effect estimates of instrumental variables method to those of control function method. In this work, I specifically consider a discrete multivalued endogenous treatment and follow a brute force comparison of asymptotic variance covariance matrices of the methods in positive semidefinite sense.
Breathomics: A Multidimensional Approach to Rapid Early Cancer Detection Using Artificial Intelligence Algorithms and Advanced Sensors for Breath-Molecular Biomarkers (with PI Talayeh Razzaghi and Co-PI Thirumalai Venkatesan), OU Big Idea Challenge 2.0 Competition
Summary: This research initiative pioneers advancements in cancer diagnostics, with a primary focus on developing an innovative artificial intelligence and machine learning model for early detection using Breathomics — the study of volatile organic compounds in human breath, with a specific emphasis on pancreatic cancer. I am part of the socioeconomic analysis team that explores the cost-effectiveness of and other economic welfare implications of the proposed model in comparison to existing cancer diagnostics. This encompasses considerations such as healthcare expenditures, potential long-term treatment cost savings, and the overall economic feasibility of the technology. Additionally, I play a role in preparing and submitting grant proposals to secure funding for the research project.