Working Papers
Machine Learning using Nonstationary Data [Paper]
Abstract: Machine learning offers a promising set of tools for forecasting. However, many of its well-established properties may not hold when applied to nonstationary data. This paper proposes a straightforward procedure to adapt machine learning methods for nonstationary datasets. The proposed method effectively removes nonstationarity without requiring the researcher to identify in advance which variables are nonstationary or the specific nature of the nonstationarity, a feature particularly desirable in high-dimensional settings. As a starting point to establish the theoretical foundation, I illustrate that applying this procedure in combination with LASSO or adaptive LASSO yields consistent variable selection when working with a mix of stationary and nonstationary explanatory variables. To examine its empirical success, I apply the method to forecasting U.S. inflation rates and the growth of industrial production index using a number of different machine learning techniques. The findings reveal that the proposed method either significantly improves prediction accuracy over traditional practices or delivers comparable performance, making it a robust and reliable choice for extracting stationary components from high-dimensional data prior to machine learning-based forecasting.
Principal Component Analysis using Nonstationary Series (with James Hamilton and Xinwei Ma). [Link]
Abstract: This paper develops a procedure for uncovering the common cyclical factors that drive a mix of stationary and nonstationary variables. The method does not require knowing which variables are nonstationary or the nature of the nonstationarity. An application to the FREDMD macroeconomic dataset demonstrates that the approach offers similar benefits to those of traditional principal component analysis with some added advantages.
PublicationsÂ
Model Selection for Multivalued-Treatment Policy Learning in Observational Studies (with Yue Fang and Haitian Xie). Journal of Business & Economic Statistics (2025). [Link]
Abstract: This study investigates the policy learning problem in observational studies, where the treatment variable can be multivalued and the propensity scores are unknown. We approximate the optimal policy in a global policy class with infinite complexity (VC/Natarajan) dimension, using a sequence of sieve policy classes with finite complexity dimension. The optimal policy within each sieve class is estimated by maximizing the empirical welfare, constructed through the doubly robust moment condition and cross-fitting method. To select the suitable sieve space, we maximize the penalized empirical welfare, with the penalty determined by either the Rademacher complexity or a holdout method. We establish oracle inequalities that demonstrate the bias and variance tradeoff achieved by the data-driven policy estimator. We also investigate two specific sieve selections: (a) a monotone single index model and (b) a systematic discretization method, which uses conventional sieve results for smooth functions such as linear sieves and deep neural networks. In the empirical study, we apply our method to examine the policy of assigning individuals to job training of different lengths.
Strength in numbers: robust mechanisms for public goods with many agents (with Haitian Xie). Social Choice and Welfare (2023). [Link]
Abstract: This study examines the mechanism design problem for public goods provision in a large economy with n independent agents. We propose a class of dominant-strategy incentive compatible and ex-post individually rational mechanisms, which we call the adjusted mean-thresholding (AMT) mechanisms. We show that when the cost of provision grows slower than the square root n rate, the AMT mechanisms are both eventually ex-ante budget balanced and asymptotically efficient. When the cost grows faster than the square root n rate, in contrast, we show that any incentive compatible, individually rational, and eventually ex-ante budget balanced mechanism must have provision probability converging to zero and hence cannot be asymptotically efficient. The AMT mechanisms have a simple form and are more informationally robust when compared to, for example, the second-best mechanism. This is because the construction of an AMT mechanism depends only on the first moment of the valuation distribution.