Working Papers
Abstract: Industrial policies (IPs) are on the rise. The most common motive for pursuing IPs is to boost strategic competitiveness of the targeted products. Leveraging a novel database of industrial policies and using the local projection difference-in-differences approach, this paper examines the dynamic relationship between IPs and trade competitiveness. Our results point to a nuanced picture. On average, products targeted by IPs experience a larger increase in competitiveness than non-targeted ones. However, there is substantial heterogeneity across different types of products and policy instruments. The average effect is driven by initially competitive products. Turning to policy instruments, domestic subsidies are associated with a temporary improvement in trade competitiveness in the short term, whereas export incentives are linked to medium-term improvements in competitiveness. Finally, we focus on three widely discussed value chains–solar photo-voltaic, wind turbines, and electric vehicles–and present suggestive evidence that IPs can have spillover effects on non-targeted products through value chain linkages. Our findings for these three value chains suggest that IPs targeting upstream products are associated with larger improvements in the RCA of products using these upstream products relative to IPs targeting products at the same value chain stage.
Under Review at ICLR 2026, arXiv version
Abstract: Large Language Models (LLMs) open new possibilities for constructing realistic and interpretable macroeconomic simulations. We present SimCity, a multi-agent framework that leverages LLMs to model an interpretable macroeconomic system with heterogeneous agents and rich interactions. Unlike classical equilibrium models that limit heterogeneity for tractability, or traditional agent-based models (ABMs) that rely on hand-crafted decision rules, SimCity enables flexible, adaptive behavior with transparent natural-language reasoning. Within SimCity, four core agent types (households, firms, a central bank, and a government) deliberate and participate in a frictional labor market, a heterogeneous goods market, and a financial market. Furthermore, a Vision-Language Model (VLM) determines the geographic placement of new firms and renders a mapped virtual city, allowing us to study both macroeconomic regularities and urban expansion dynamics within a unified environment. To evaluate the framework, we compile a checklist of canonical macroeconomic phenomena, including price elasticity of demand, Engel's Law, Okun's Law, the Phillips Curve, and the Beveridge Curve, and show that SimCity naturally reproduces these empirical patterns while remaining robust across simulation runs.
Abstract: Using LLM-generated labels to fine-tune smaller encoder-only models for text classification has gained popularity in various settings. While this approach may be justified in simple and low-stakes applications, we conduct empirical analysis to demonstrate how the perennial curse of training on synthetic data manifests itself in this specific setup. Compared to models trained on gold labels, we observe not only the expected performance degradation in accuracy and F1 score, but also increased instability across training runs and premature performance plateaus. These findings cast doubts on the reliability of such approaches in real-world applications. We contextualize the observed phenomena through the lens of error propagation and offer several practical mitigation strategies, including entropy-based filtering and ensemble techniques. Although these heuristics offer partial relief, they do not fully resolve the inherent risks of propagating non-random errors from LLM annotations to smaller classifiers, underscoring the need for caution when applying this workflow in high-stakes text classification tasks.
Accepted to AIFIN25 Workshop (Unable to present due to logistics), draft available upon request
Abstract: This paper explores the use of large language models (LLMs) to forecast monetary policy decisions and quantify monetary policy shocks from central bank communications. Unlike prior approaches that rely on expert-defined economic dimensions and sentiment scoring, we use LLM-based embeddings to obtain document-level representations of Bank of England Monetary Policy Committee (MPC) minutes. Crucially, to mitigate potential lookahead bias, we anonymize all documents by removing temporal references and committee member names. We evaluate our approach on two tasks: (1) future policy action prediction, and (2) macroeconomic effects estimation via a structural vector autoregression framework. Our method is benchmarked against macroeconomic models and sentiment-based classifiers. We find that LLM-derived representations improve out-of-sample prediction accuracy and yield impulse responses consistent with economic theory. These results highlight the potential of LLMs to connect qualitative policy text with quantitative macroeconomic modeling in a scalable and robust way.
Under Review at WWW 2026
Abstract: Public funding plays a central role in driving scientific discovery. To better understand the link between research inputs and outputs, we introduce FIND (Funding–Impact NSF Database), an open-access dataset that systematically links NSF grant proposals to their downstream research outputs, including publication metadata and abstracts. The primary contribution of this project is the creation of a large-scale, structured dataset that enables transparency, impact evaluation, and metascience research on the returns to public funding. To illustrate the potential of FIND, we present two proof-of-concept NLP applications. First, we analyze whether the language of grant proposals can predict the subsequent citation impact of funded research. Second, we leverage large language models to extract scientific claims from both proposals and resulting publications, allowing us to measure the extent to which funded projects deliver on their stated goals. Together, these applications highlight the utility of FIND for advancing metascience, informing funding policy, and enabling novel AI-driven analyses of the scientific process.
Published
Proceedings of the ACL 2025 Workshop on NLP for Positive Impact
Abstract: Green industrial policies (GIPs) are government interventions that support environmentally sustainable economic growth through targeted incentives, regulations, and investments in clean technologies. As the backbone of climate mitigation and adaptation, GIPs deserve systematic documentation and analysis. However, two major hurdles impede this systematic documentation. First, unlike other climate policy documents such as Nationally Determined Contributions (NDCs) which are centrally curated, GIPs are scattered across numerous government legislation and policy announcements. Second, extracting information from these diverse documents is expensive when relying on expert annotation. We address this gap by proposing \textit{GreenSpyder}, an LLM-based workflow that actively monitors, classifies, and annotates GIPs from open-source information. As a demonstration, we benchmark LLM performance in classifying and annotating GIPs on a small expert-curated dataset. Our results show that LLMs can be quite effective for classification and coarse annotation tasks, though they still need improvement for more nuanced classification. Finally, as a real-world application, we apply \textit{GreenSpyder} to U.S. Legislative Records from the 117th Congress, paving the way for more comprehensive LLM-based GIP documentation in the future.
Abstract: Accurately synthesizing climate evidence into concise statements is crucial for policy making and fostering public trust in climate science. Recent advancements in Large Language Models (LLMs), particularly the emergence of reasoning-optimized variants excelling at mathematical and logical tasks, present a promising yet untested opportunity for scientific evidence synthesis. We evaluate state-of-the-art reasoning LLMs on two key tasks: (1) contextual confidence classification, assigning appropriate confidence levels to climate statements based on evidence, and (2) factual summarization of climate evidence, generating concise summaries evaluated for coherence, faithfulness, and similarity to expert-written versions. Using a novel dataset of 612 structured examples constructed from the Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC), we find reasoning LLMs outperform general-purpose models in confidence classification by 8 percentage points in accuracy and macro-F1 scores. However, for summarization tasks, performance differences between model types are mixed. Our findings demonstrate that reasoning LLMs show promise as auxiliary tools for confidence assessment in climate evidence synthesis, while highlighting significant limitations in their direct application to climate evidence summarization. This work establishes a foundation for future research on the targeted integration of LLMs into scientific assessment workflows.
Accepted to NeurIPS 2025, link available soon
Abstract: We present MetaFind, a scene-aware multi-modal retrieval framework designed to enhance scene generation in the metaverse by retrieving 3D assets from large-scale repositories. MetaFind addresses two core challenges: (i) inconsistent asset retrieval that overlooks spatial, semantic, and stylistic constraints, and (ii) the absence of a standardized retrieval paradigm specifically tailored for 3D asset retrieval, as existing approaches predominantly rely on general-purpose 3D shape representation models. Our key innovation is a retrieval mechanism that enhances both spatial reasoning and style consistency by jointly modeling object-level features (including appearance) and scene-level layout structures. Methodologically, MetaFind introduces a plug-and-play layout encoder that captures both spatial relationships and object appearance features, ensuring retrieved 3D assets are contextually and stylistically coherent with the existing scene. The framework supports iterative scene construction by continuously adapting retrieval results to current scene updates. Empirical evaluations demonstrate the improved spatial and stylistic consistency of MetaFind in various retrieval tasks compared to baseline methods.
Work in Progress
Other Projects
Fun project as an aviation enthusiast
I built a cascade system where the first stage is a finetuned Whisper model and the second stage is an LLM with location context to correct transcriptions.
Email me if you want to see a demo.
Old paper from my "philosopher" days, PhilSci-Archive
Abstract: This paper defends Causal Decision Theory(CDT) against an alleged counterexample. In Dicing with Death (2014), Arif Ahmed devises a decision scenario where the recommendation given by Causal Decision Theory apparently contradicts our intuition about the correct course of action to take. Similar to many other alleged counterexamples to CDT, Ahmed’s story features an adversary (Death himself, in this case) with fantastic predictive power. Unlike many other alleged counterexamples, however, Ahmed explicitly includes fundamental use of randomization as a possible action for the agent. This paper assesses these two features of Ahmed’s story. It argues that Death’s fantastic predictive power in this case cannot be taken for granted and some explanations must be given, otherwise the decision scenario Ahmed proposes would be incoherent or at least incomplete. After considering a few such explanations, however, it becomes unclear if the initial intuition which CDT apparently contradicts still holds up. We conclude that biting the bullet is a legitimate response from CDT to many similar cases where evidentially correlated but causally isolated acts seem to force CDT to give counterintuitive recommendations.