Working Papers
Control Function Approach to Multivalued Endogenous Treatment Effects (Under Review, Journal of Business and Economic Statistics, joint with Jeffrey M. Wooldridge)
Abstract: Using a control function (CF) approach, we study average treatment effects (ATEs) in discrete endogenous multivalued treatments complicated by correlated random coefficients (CRCs) and heterogeneous counterfactual errors, extending the results in binary treatments. Specifically, we offer a consistent CF estimator for the ATEs and state its asymptotic properties in this framework and two special cases. Moreover, conducting Monte Carlo simulations, we compare CF method with instrumental variables (IV) method. Simulation results suggest that CF method can be asymptotically up to 12% more efficient than IV method, and the asymptotic bias in IV estimates can be as high as 43%. However, when misspecication is introduced, simulations results favor IV method. For empirical illustration, we apply OLS, CF, IV, and nonparametric bound analysis to estimate how limited English proficiency (LEP) infuences wages of Hispanic workers in the USA. The data come from the 1% Public Use Microdata Sample of the 1990 US Census. Utilizing age at arrival as an instrument, both OLS and CF methods indicate that LEP imposes a signicant wage penalty (up to 79% in some CF estimates) on Hispanic community in the USA. IV method mostly produces insignicant results, and nonparametric bound analysis provides uninformative lower bounds.
Estimation for Multivalued Endogenous Treatment Effect Models Using High Dimensional Methods: A Simulation Study (in preparation for submission to Journal of Applied Econometrics)
Abstract: Using a simulation study, I examine the finite sample performances of several machine learning (ML) methods and CF method for discrete multivalued endogenous treatments in a particular setting where there exists an extra set of high dimensional variables and a low dimensional (and unknown) subset of these variables has an impact on the outcome; however, all of these high dimensional variables are totally ignorable to the decision to undertake the treatment given some instruments in the selection equation. I also allow non-Gaussian and heterogeneous counterfactual errors in the model and use a CF approach to address endogeneity. To estimate the parameters of interest, I use CF method and four different ML methods (i.e., least absolute shrinkage and selection operator (LASSO), post partial-out LASSO , post double selection LASSO, and double/debiased LASSO). Then, I compare their performances taking into consideration measures such as bias of estimates, standard deviation of estimates, mean absolute prediction error, root mean square error, mean number of correctly selected covariates, and mean size of selected set of covariates. The main Monte Carlo simulation finding is that, on top of being on par with CF method in finite sample bias ground when the high dimensional variables are orthogonal to the variables of interest already included, the LASSO-based methods can surpass the efficiency performance of CF method in ATE estimation if there exist enough extra predictive variables that are ignorable in treatment selection among a set of high dimensional predictors of outcome.
Work in Progress
An Application of Machine Learning Methods to Demand for Organic Fruits
Abstract: I estimate monthly household demand for organic fruit by household income class by using machine learning methods, e.g., LASSO, support vector machines, bagging, and random forests, and standard methods, e.g., stepwise regression, forward stagewise regression, linear regression and the conditional logit, and compare their predictive power. I use a sample of US household organic and conventional fruit purchases from 2011 through 2013, which comes from the Nielsen Corporation’s Consumer Panel Data.
Instrumental Variables Estimation for the Effectiveness of Peer Tutoring Programs with Self-selection Problem: Evidence from Economics Help Rooms and Integrative Studies Peer-Assisted Learning Sessions at Michigan State University
Abstract: I analyze the effectiveness of the MSU College of Social Sciences’ Economics Help Rooms and Peer-assisted Learning (PAL) program. Over a period of 6 semesters from Fall 2017 through Spring 2020, I collected administrative and survey data for all students enrolled in classes served by the Economics Help Rooms and PAL program. The data have information on the final grade that students received for the class; gender; minority status; class; high school GPA; college grades thus far; composite ACT score; Pell grant eligibility; whether the student is an international student, a first generation college student, and/or an intercollegiate athlete; whether and how often these students visited the help rooms and/or PAL program for their class, and some other student characteristics. In a first-stage negative binomial regression using distance to the help rooms and/or PAL program locations as an identification variable, I tackle the self-selection problem and generate fitted values for student visits. With these fitted values, in the second-stage instrumental variables ordered probit regression, I relate the use of the help rooms and/or PAL program to the final grade in the course. My empirical results suggest that the help rooms and PAL program contribute to higher grades for students.
Weak-instrument Robust Estimation and Inference in Linear Instrumental Variables Regression with Heteroskedasticity and a Single Multivalued Endogenous Explanatory Variable
Abstract: I investigate the detection of weak instruments, weak-instrument robust estimation and inference. I especially focus on the case where the errors in the reduced-form and first-stage regressions are heteroskedastic, and the linear IV regression has a single multivalued endogenous explanatory variable with weak instruments. It is also worth analyzing how nonlinear instruments (e.g., predicted probabilities from the first-stage regression) can help with a weak IV in this framework.
Control Function Method vs. Instrumental Variables: An Asymptotic Efficiency Comparison
Abstract: From asymptotic efficiency standpoint, I compare the average treatment effect estimates of instrumental variables method to those of control function method. In this work, I specifically consider a discrete multivalued endogenous treatment and follow a brute force comparison of asymptotic variance covariance matrices of the methods in positive semidefinite sense.
Breathomics: A Multidimensional Approach to Rapid Early Cancer Detection Using Artificial Intelligence Algorithms and Advanced Sensors for Breath-Molecular Biomarkers (with PI Talayeh Razzaghi and Co-PI Thirumalai Venkatesan), OU Big Idea Challenge 2.0 Competition
Summary: This research initiative pioneers advancements in cancer diagnostics, with a primary focus on developing an innovative artificial intelligence and machine learning model for early detection using Breathomics — the study of volatile organic compounds in human breath, with a specific emphasis on pancreatic cancer. I am part of the socioeconomic analysis team that explores the cost-effectiveness of and other economic welfare implications of the proposed model in comparison to existing cancer diagnostics. This encompasses considerations such as healthcare expenditures, potential long-term treatment cost savings, and the overall economic feasibility of the technology. Additionally, I play a role in preparing and submitting grant proposals to secure funding for the research project.