Working Papers
Coverage, Reasoning, Dynamics, Identification (with Cynthia Wu and Shihan Xie) [link]
Abstract: We propose a new LLM-based survey framework that enables retrospective coverage, economic reasoning, dynamic effects, and clean identification. We recover human-comparable treatment effects in a multi-wave randomized controlled trial of inflation expectations surveys, at 1/1000 the cost. To demonstrate the framework’s full potential, we extend the benchmark human survey (10 waves, 2018–2023) to over 50 waves dating back to 1990. We further examine the economic mechanisms underlying agents’ expectation formation, identifying the mean-reversion and individual-attention channels. Finally, we trace dynamic treatment effects and demonstrate clean identification. Together, these innovations demonstrate that LLM surveys enable research designs unattainable with human surveys.
A Synthetic Business Cycle Approach to Counterfactual Analysis with Nonstationary Macroeconomic Data (with Zhentao Shi and Haitian Xie) [link][code]
Abstract: This paper investigates the use of synthetic control methods for causal inference in macroeconomic settings when dealing with possibly nonstationary data. While the synthetic control approach has gained popularity for estimating counterfactual outcomes, we caution researchers against assuming a common nonstationary trend factor across units for macroeconomic outcomes, as doing so may result in misleading causal estimation—a pitfall we refer to as the spurious synthetic control problem. To address this issue, we propose a synthetic business cycle framework that explicitly separates trend and cyclical components. By leveraging the treated unit's historical data to forecast its trend and using control units only for cyclical fluctuations, our divide-and-conquer strategy eliminates spurious correlations and improves the robustness of counterfactual prediction in macroeconomic applications. As empirical illustrations, we examine the cases of German reunification and the handover of Hong Kong, demonstrating the advantages of the proposed approach.
Machine Learning using Nonstationary Data [paper]
Abstract: Machine learning offers a promising set of tools for forecasting. However, many of its well-established properties may not hold when applied to nonstationary data. This paper proposes a straightforward procedure to adapt machine learning methods for nonstationary datasets. The proposed method effectively removes nonstationarity without requiring the researcher to identify in advance which variables are nonstationary or the specific nature of the nonstationarity, a feature particularly desirable in high-dimensional settings. As a starting point to establish the theoretical foundation, I illustrate that applying this procedure in combination with LASSO or adaptive LASSO yields consistent variable selection when working with a mix of stationary and nonstationary explanatory variables. To examine its empirical success, I apply the method to forecasting U.S. inflation rates and the growth of industrial production index using a number of different machine learning techniques. The findings reveal that the proposed method either significantly improves prediction accuracy over traditional practices or delivers comparable performance, making it a robust and reliable choice for extracting stationary components from high-dimensional data prior to machine learning-based forecasting.
Principal Component Analysis using Nonstationary Series (with James Hamilton and Xinwei Ma). [link]
Abstract: This paper develops a procedure for uncovering the common cyclical factors that drive a mix of stationary and nonstationary variables. The method does not require knowing which variables are nonstationary or the nature of the nonstationarity. An application to the FREDMD macroeconomic dataset demonstrates that the approach offers similar benefits to those of traditional principal component analysis with some added advantages.
Publications
Model Selection for Multivalued-Treatment Policy Learning in Observational Studies (with Yue Fang and Haitian Xie). Journal of Business & Economic Statistics (2025). [link]
Abstract: This study investigates the policy learning problem in observational studies, where the treatment variable can be multivalued and the propensity scores are unknown. We approximate the optimal policy in a global policy class with infinite complexity (VC/Natarajan) dimension, using a sequence of sieve policy classes with finite complexity dimension. The optimal policy within each sieve class is estimated by maximizing the empirical welfare, constructed through the doubly robust moment condition and cross-fitting method. To select the suitable sieve space, we maximize the penalized empirical welfare, with the penalty determined by either the Rademacher complexity or a holdout method. We establish oracle inequalities that demonstrate the bias and variance tradeoff achieved by the data-driven policy estimator. We also investigate two specific sieve selections: (a) a monotone single index model and (b) a systematic discretization method, which uses conventional sieve results for smooth functions such as linear sieves and deep neural networks. In the empirical study, we apply our method to examine the policy of assigning individuals to job training of different lengths.
Strength in numbers: robust mechanisms for public goods with many agents (with Haitian Xie). Social Choice and Welfare (2023). [link]
Abstract: This study examines the mechanism design problem for public goods provision in a large economy with n independent agents. We propose a class of dominant-strategy incentive compatible and ex-post individually rational mechanisms, which we call the adjusted mean-thresholding (AMT) mechanisms. We show that when the cost of provision grows slower than the square root n rate, the AMT mechanisms are both eventually ex-ante budget balanced and asymptotically efficient. When the cost grows faster than the square root n rate, in contrast, we show that any incentive compatible, individually rational, and eventually ex-ante budget balanced mechanism must have provision probability converging to zero and hence cannot be asymptotically efficient. The AMT mechanisms have a simple form and are more informationally robust when compared to, for example, the second-best mechanism. This is because the construction of an AMT mechanism depends only on the first moment of the valuation distribution.