Working Papers
Do Industrial Policies Increase Trade Competitiveness? (with Yueling Huang, Sandra Baquie, Florence Jaumotte, Jaden Kim, Rafael Machado Parente, Samuel Pienknagura)
IMF Working Paper
Industrial policies (IPs) are on the rise. The most common motive for pursuing IPs is to boost strategic competitiveness of the targeted products. Leveraging a novel database of industrial policies and using the local projection difference-in-differences approach, this paper examines the dynamic relationship between IPs and trade competitiveness. Our results point to a nuanced picture. On average, products targeted by IPs experience a larger increase in competitiveness than non-targeted ones. However, there is substantial heterogeneity across different types of products and policy instruments. The average effect is driven by initially competitive products. Turning to policy instruments, domestic subsidies are associated with a temporary improvement in trade competitiveness in the short term, whereas export incentives are linked to medium-term improvements in competitiveness. Finally, we focus on three widely discussed value chains–solar photo-voltaic, wind turbines, and electric vehicles–and present suggestive evidence that IPs can have spillover effects on non-targeted products through value chain linkages. Our findings for these three value chains suggest that IPs targeting upstream products are associated with larger improvements in the RCA of products using these upstream products relative to IPs targeting products at the same value chain stage.
Monetary Policy Forecasting from Central Bank Communications: An Embedding-Based Approach (With Man Chon Iao)
Scheduled to Present at AIFIN25 Workshop
This paper explores the use of large language models (LLMs) to forecast monetary policy decisions and quantify monetary policy shocks from central bank communications. Unlike prior approaches that rely on expert-defined economic dimensions and sentiment scoring, we use LLM-based embeddings to obtain document-level representations of Bank of England Monetary Policy Committee (MPC) minutes. Crucially, to mitigate potential lookahead bias, we anonymize all documents by removing temporal references and committee member names. We evaluate our approach on two tasks: (1) future policy action prediction, and (2) macroeconomic effects estimation via a structural vector autoregression framework. Our method is benchmarked against macroeconomic models and sentiment-based classifiers. We find that LLM-derived representations improve out-of-sample prediction accuracy and yield impulse responses consistent with economic theory. These results highlight the potential of LLMs to connect qualitative policy text with quantitative macroeconomic modeling in a scalable and robust way.
(Draft Available Upon Request)
Feeding LLM Annotations to BERT Classifiers at Your Own Risk (With Kazimier Smith)
Under Review
Abstract: Using LLM-generated labels to fine-tune smaller encoder-only models for text classification has gained popularity in various settings. While this approach may be justified in simple and low-stakes applications, we conduct empirical analysis to demonstrate how the perennial curse of training on synthetic data manifests itself in this specific setup. Compared to models trained on gold labels, we observe not only the expected performance degradation in accuracy and F1 score, but also increased instability across training runs and premature performance plateaus. These findings cast doubts on the reliability of such approaches in real-world applications. We contextualize the observed phenomena through the lens of error propagation and offer several practical mitigation strategies, including entropy-based filtering and ensemble techniques. Although these heuristics offer partial relief, they do not fully resolve the inherent risks of propagating non-random errors from LLM annotations to smaller classifiers, underscoring the need for caution when applying this workflow in high-stakes text classification tasks.
https://arxiv.org/abs/2504.15432
Published
Can Reasoning LLMs Synthesize Complex Climate Statements?
ACL 2025 Workshop ClimateNLP
Abstract: Accurately synthesizing climate evidence into concise statements is crucial for policy making and fostering public trust in climate science. Recent advancements in Large Language Models (LLMs), particularly the emergence of reasoning-optimized variants excelling at mathematical and logical tasks, present a promising yet untested opportunity for scientific evidence synthesis. We evaluate state-of-the-art reasoning LLMs on two key tasks: (1) contextual confidence classification, assigning appropriate confidence levels to climate statements based on evidence, and (2) factual summarization of climate evidence, generating concise summaries evaluated for coherence, faithfulness, and similarity to expert-written versions. Using a novel dataset of 612 structured examples constructed from the Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC), we find reasoning LLMs outperform general-purpose models in confidence classification by 8 percentage points in accuracy and macro-F1 scores. However, for summarization tasks, performance differences between model types are mixed. Our findings demonstrate that reasoning LLMs show promise as auxiliary tools for confidence assessment in climate evidence synthesis, while highlighting significant limitations in their direct application to climate evidence summarization. This work establishes a foundation for future research on the targeted integration of LLMs into scientific assessment workflows.
https://aclanthology.org/2025.climatenlp-1.21
Tracking Green Industrial Policies with LLM: A Demonstration
ACL 2025 Workshop NLP for Positive Impact
Green industrial policies (GIPs) are government interventions that support environmentally sustainable economic growth through targeted incentives, regulations, and investments in clean technologies. As the backbone of climate mitigation and adaptation, GIPs deserve systematic documentation and analysis. However, two major hurdles impede this systematic documentation. First, unlike other climate policy documents such as Nationally Determined Contributions (NDCs) which are centrally curated, GIPs are scattered across numerous government legislation and policy announcements. Second, extracting information from these diverse documents is expensive when relying on expert annotation. We address this gap by proposing \textit{GreenSpyder}, an LLM-based workflow that actively monitors, classifies, and annotates GIPs from open-source information. As a demonstration, we benchmark LLM performance in classifying and annotating GIPs on a small expert-curated dataset. Our results show that LLMs can be quite effective for classification and coarse annotation tasks, though they still need improvement for more nuanced classification. Finally, as a real-world application, we apply \textit{GreenSpyder} to U.S. Legislative Records from the 117th Congress, paving the way for more comprehensive LLM-based GIP documentation in the future.
https://aclanthology.org/2025.nlp4pi-1.1
Other Writings:
On Dicing with Death: Defending Causal Decision Theory Against Uncanny Correlation
Abstract: This paper defends Causal Decision Theory(CDT) against an alleged counterexample. In Dicing with Death (2014), Arif Ahmed devises a decision scenario where the recommendation given by Causal Decision Theory apparently contradicts our intuition about the correct course of action to take. Similar to many other alleged counterexamples to CDT, Ahmed’s story features an adversary (Death himself, in this case) with fantastic predictive power. Unlike many other alleged counterexamples, however, Ahmed explicitly includes fundamental use of randomization as a possible action for the agent. This paper assesses these two features of Ahmed’s story. It argues that Death’s fantastic predictive power in this case cannot be taken for granted and some explanations must be given, otherwise the decision scenario Ahmed proposes would be incoherent or at least incomplete. After considering a few such explanations, however, it becomes unclear if the initial intuition which CDT apparently contradicts still holds up. We conclude that biting the bullet is a legitimate response from CDT to many similar cases where evidentially correlated but causally isolated acts seem to force CDT to give counterintuitive recommendations.
https://philsci-archive.pitt.edu/id/eprint/25128