Research

Publications in Journals

Economics and Finance

Al-Haschimi A, Apostolou A. Azqueta-Gavaldon A. and Ricci M. (2026). Assesing financial risk in China: a text-based indicator approach, Journal of Internantional Money and Finance. https://doi.org/10.1016/j.jimonfin.2025.103514

Abstract: This paper examines the complexities and global repercussions of financial risk in China by developing a novel text-based financial risk indicator tailored to the Chinese context. We identify key themes such as banking, financial markets, exchange rates, real estate, corporate profitability, and corporate investment, which significantly influenced financial risk during critical periods like the 2015–16 financial stress episode and the 2021 Evergrande default. Utilizing a structural Vector Autoregression (VAR) framework, we analyze the impact of the aggregated financial risk indicator on both the Chinese and the global economy. Our findings reveal that heightened financial risk in China correlates with a drop in equity prices and has substantial global repercussions, including contractions in world industrial production, declines in global oil prices, and widened emerging market government bond spreads.

Azqueta-Gavaldon A. (2023). Political referenda and investment: evidence from Scotland, European Journal of Political Economy, https://doi.org/10.1016/j.ejpoleco.2023.102474 [DATA]

Abstract: We present evidence that referenda have a significant, detrimental outcome on investment. Employing an unsupervised machine learning algorithm over the period 2008- 2017, we construct three important uncertainty indices underlying reports in the Scottish news media: Scottish independence (IndyRef )-related uncertainty; Brexit-related uncertainty; and Scottish policy-related uncertainty. Examining the relationship of these indices with investment on a longitudinal panel of 3,589 Scottish firms, the evidence suggests that Brexit-related uncertainty associates more strongly than IndyRef -related uncertainty to investment. Our preferred specification suggests that a one standarddeviation increase in Brexit uncertainty foreshadows a reduction in investment by 8% on average in the following year. Besides we find that the uncertainty associated with the Scottish referendum for independence while negligible at the aggregate level, relates more strongly with the investment of listed firms as well as those operating on the border with England. In addition, we present evidence of greater sensitivity to these indices among firms that are financially constrained or whose investment is to a greater degree irreversible.

Azqueta-Gavaldon A., Hirschbühl D., Onorante L., and Saiz L. (2023). Sources of economic policy uncertainty in the euro area, European Economic Review, https://doi.org/10.1016/j.euroecorev.2023.104373

Abstract: We create economic policy uncertainty (EPU) indicators for the four largest euro area countries by applying two unsupervised machine learning algorithms to news articles. The procedure allows to uncover components of EPU endogenously for the four European languages. The uncertainty indices computed from January 2000 to May 2019 capture episodes of regulatory change, trade tensions and financial stress. In an evaluation exercise, we use a structural vector autoregression model to study the effects of uncertainty on investment and on private consumption. We document considerable effects for the political and domestic regulation uncertainty components on investment, while the other types show heterogeneous effects across countries. For instance, trade uncertainty influences Germany’s investment more than its counterparts. Moreover, we observe strong negative effects of uncertainty on consumption for countries such as Italy (political) and Spain (fiscal, political and domestic regulation).

Azqueta-Gavaldon A. (2017) Developing news-based Economic Policy Uncertainty index with unsupervised machine learning. Economics Letters, 158, 47-50.

[CODES]

Abstract: I propose creating a news-based Economic Policy Uncertainty (EPU) index by employing an unsupervised algorithm able to deduce the subject of each article without the need for pre-labeled data. This approach economizes on costly human classification to pre-define a set of keywords.

Data: The time series created by the unsupervised algorithm can be found here.

Data Science

Wallis J., Azqueta-Gavaldon A., Ananthakumar T., Dürichen R., and Albergante L. (2022). Similarity-based prediction of ejection fraction in heart failure patients. Informatics in Medicine Unlocked, Volume 32: https://doi.org/10.1016/j.imu.2022.101035

[Arxiv]

Abstract: Biomedical research is increasingly employing real world evidence (RWE) to foster discoveries of novel clinical phenotypes and to better characterize long term effect of medical treatments. However, due to limitations inherent in the collection process, RWE often lacks key features of patients, particularly when these features cannot be directly encoded using data standards such as ICD-10. Here, we propose a novel data-driven statistical machine learning approach, named Feature Imputation via Local Likelihood (FILL), designed to infer missing features by exploiting feature similarity between patients. We test our method using a particularly challenging problem: differentiating heart failure patients with reduced versus preserved ejection fraction (HFrEF and HFpEF respectively). The complexity of the task stems from three aspects: the two share many common characteristics and treatments, only part of the relevant diagnoses may have been recorded, and the information on ejection fraction is often missing from RWE datasets. Despite these difficulties, our method is shown to be capable of inferring heart failure patients with HFpEF with a precision above 80% when considering multiple scenarios across two RWE datasets containing 11,950 and 10,051 heart failure patients. This is an improvement when compared to classical approaches such as logistic regression and random forest which were only able to achieve a precision < 73%. Finally, this approach allows us to analyse which features are commonly associated with HFpEF patients. For example, we found that specific diagnostic codes for atrial fibrillation and personal history of long-term use of anticoagulants are often key in identifying HFpEF patients.

Azqueta-Gavaldon A. (2019). Causal inference between cryptocurrency narratives and prices: Evidence from a complex dynamic ecosystem. Physica A: Statistical Mechanics and its Applications, Volume 537: https://doi.org/10.1016/j.physa.2019.122574

[CODES] [TOPICS VISUALIZATION]

Abstract: In this note, I explore the causal relationship between narratives propagated by the media and crypto prices. Firstly, I unveil four cryptocurrency-related narratives: investment, technological innovation, security breaches and regulation. Secondly, after acknowledging their tone (sentiment), I apply Convergent Cross Mapping (CCM) to assess the causal relationship between narratives and prices. I find strong bi-directional causal relationships between narratives concerning investment and regulation while a uni-directional causal association exists in narratives relating technology and security to prices. Therefore, this work contributes to the recent economic literature that connects consumer behaviour to narratives .

Azqueta-Gavaldon A. (2017): Financial Investment and economic policy uncertainty in the UK. IML '17 Proceedings of the 1st International Conference on Internet of Things and Machine Learning. https://dl.acm.org/citation.cfm?id=3158380

Abstract: UK based financial firms following Brexit reported net disinvestment of 15 billion pounds. This was the fifth time financial disinvestment occurred since the production of this data: 1987. Parallel to this event, Economic Policy Uncertainty (EPU) in the UK experienced its biggest rise during Brexit June 2016. This note studies the relationship between EPU and its particular components and financial investment. I find that overall EPU and specifically fiscal policy, monetary policy, geopolitical, regulation and liquidity uncertainty have the highest negative sensitivity to financial investment.

Other publications

Azqueta-Gavaldon A. and Ramos Cosgrove (2025). Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification. Arxiv: https://arxiv.org/abs/2507.11086

Abstract: The growing prevalence of cross-border financial activities in global markets has underscored the necessity of accurately identifying and classifying foreign entities. This practice is essential within the Spanish financial system for ensuring robust risk management, regulatory adherence, and the prevention of financial misconduct. This process involves a labor-intensive entity-matching task, where entities need to be validated against available reference sources. Challenges arise from linguistic variations, special characters, outdated names, and changes in legal forms, complicating traditional matching algorithms like Jaccard, cosine, and Levenshtein distances. These methods struggle with contextual nuances and semantic relationships, leading to mismatches. To address these limitations, we explore Large Language Models (LLMs) as a flexible alternative. LLMs leverage extensive training to interpret context, handle abbreviations, and adapt to legal transitions. We evaluate traditional methods, Hugging Face-based LLMs, and interface-based LLMs (e.g., Microsoft Copilot, Alibaba's Qwen 2.5) using a dataset of 65 Portuguese company cases. Results show traditional methods achieve accuracies over 92% but suffer high false positive rates (20-40%). Interface-based LLMs outperform, achieving accuracies above 93%, F1 scores exceeding 96%, and lower false positives (40-80%).

Andrés Alonso-Robisco, Andrés Azqueta-Gavaldón, José Manuel Carbó, José Luis González, Ana Isabel Hernáez, José Luis Herrera, Jorge Quintana and Javier Tarancón. (2025). Empowering financial supervision: a SupTech experiment using machine learning in an early warning system Banco de España Occasioanl Paper Issue 2504.

Abstract: New technologies have made available a vast amount of new data in the form of text, recording an exponentially increasing share of human and corporate behavior. For financial supervisors, the information encoded in text is a valuable complement to the more traditional balance sheet data typically used to track the soundness of financial institutions. In this study, we exploit several natural language processing (NLP) techniques as well as network analysis to detect anomalies in the Spanish corporate system, identifying both idiosyncratic and systemic risks. We use sentiment analysis at the corporate level to detect sentiment anomalies for specific corporations (idiosyncratic risks), while employing a wide range of network metrics to monitor systemic risks. In the realm of supervisory technology (SupTech), anomaly detection in sentiment analysis serves as a proactive tool for financial authorities. By continuously monitoring sentiment trends, SupTech applications can provide early warnings of potential financial distress or systemic risks.

Azqueta-Gavaldon A, Diakonova M, Ghirelli C. and Perez J. (2023) Sources of economic policy uncertainty in the euro area: a ready-to-use database, Banco de España Occasional Paper Issue 2314

Abstract: In this paper, we build a publicly-available database of economic policy uncertainty (EPU) indicators based on the methodology proposed by Azqueta-Gavaldón, Hirschbühl, Onorante and Saiz (2023), which uses topic modelling techniques to identify distinct components of EPU. This database is regularly updated and can be accessed on the Banco de España’s website. Currently, the dataset covers the four largest countries in the euro area, namely Spain, Italy, France, and Germany. Our data coverage is continually expanding to include more euro area countries. Additionally, we compute the aggregated EPU indexes for the euro area. This comprehensive dataset and the resulting euro area indexes provide valuable tools for researchers, policymakers and analysts to assess and monitor the dynamics of economic policy uncertainty in real time.

Azqueta-Gavaldon A., Hirschbühl D., Onorante L., and Saiz L. (2020):Nowcasting business cycle turning points with stock networks and machine learning , Working Paper Series 2494 , European Central Bank

Abstract: We propose a granular framework that makes use of advanced statistical methods to approximate developments in economy-wide expected corporate earnings. In particular, we evaluate the dynamic network structure of stock returns in the United States as a proxy for the transmission of shocks through the economy and identify node positions (firms) whose connectedness provides a signal for economic growth. The nowcasting exercise, with both the in-sample and the out-of-sample consistent feature selection, highlights which firms are contemporaneously exposed to aggregate downturns and provides a more complete narrative than is usually provided by more aggregate data. The two-state model for predicting periods of negative growth can remarkably well predict future states by using information derived from the node-positions of manufacturing, transportation and financial (particularly insurance) firms. The three-states model, which identifies high, low and negative growth, successfully predicts economic regimes by making use of information from the financial, insurance, and retail sectors.

Azqueta-Gavaldon A., Hirschbühl D., Onorante L., and Saiz L. (2019): Sources of economic policy uncertainty in the euro area: a machine learning approach, Economic Bulletin Boxes, European Central Bank, vol. 5.

Work in progress

Light exposure and infant health (join Gunyi Yang and Ella Reese-Clauson)

Abstract: In an era where urbanization and technological advancements shape our environments, the pervasive presence of artificial light at night has raised concerns about its potential effects on human health. While research has extensively explored the impact of light pollution on adult well-being, there remains a notable gap in understanding its implications for vulnerable populations, particularly infants. One of the challenges in addressing this gap lies in the limited track records on the specific responses of infants to prolonged exposure to artificial light. To this end we use the National Vital Statistics System (NVSS) dataset which is the most complete data on births and deaths in the United States. Through a comprehensive analysis of data encompassing environmental light levels and infant health records, we aim to shed light on the potential correlations and implications for neonatal health. The findings of this research hold promise in advancing our understanding of environmental factors influencing infant well-being and may contribute to the development of targeted interventions for healthier early childhood development.

Revising Economic Uncertainty: beyond EPU (join with Javier Perez, Corinna Ghirelli and Marina Diakonova)

Vaccination narratives (join with Theodoris Koutmeridis and Kleanthis Arampatzis)

Abstract: In the landscape of modern public health, vaccination discourse is intricately interwoven with social media narratives. This paper employs text-mining techniques to unravel sentiments encompassing vaccination rates in the United Kingdom, with a specific focus on pregnant women – a vulnerable group that had to navigate uncertainties and changes in vaccination recommendations. Amid the information deluge, clear policy communication emerges as paramount. Our study underscores the symbiotic relationship between sentiment and policy communication, emphasizing the need for effective strategies that bridge information gaps and promote informed decision-making. By examining public perceptions, we contribute to the ongoing dialogue on health communication’s vital role in shaping vaccination attitudes.

Developing a real estate yield investment device using granular data and machine learning (join with Gonzalo Azqueta-Gavaldon, Monica Azqueta-Gavaldon, and Inigo Azqueta-Gavaldon) [arXiv]

Abstract: This project aims at creating an investment device to help investors determine which real estate units have a higher return to investment in Madrid. The idea is simple: determine what is the rental price of a unit per month and how much would the cost ofthe mortgage be. To do so, we gather data from Idealista.com, a real estate web-page with millions of real estate units across Spain, Italy and Portugal. In this note, wepresent the road map on how we gather the data, descriptive statistics of the 8,121real estate units used (rental and sale); build a return index based on the differencein prices of rental and sale units (per neighborhood and size) and introduce machinelearning algorithms for rental real estate price prediction.

Google Sites

Report abuse