Climate Attention Index (CAI)
Climate Attention Index (CAI)
We construct a climate attention index that measures the extent to which climate change is discussed in the news media. Our method focuses on newspapers with a significant presence on Twitter across many countries. This approach allows us to compile a large dataset of over 23 million tweets, which we then aggregate at the country level across daily, weekly, monthly, and quarterly frequencies. We compare the aggregated text to a corpus of authoritative texts on climate change, similar to the method used by Engle et al. (2020). Our data coverage refers to a total of 25 countries that span a wide range of local languages, income levels, and geographical regions. Our country-level climate attention index is publicly available for download here.
GLOBAL CLIMATE ATTENTION INDEX (CAI)
Notes: This figure shows our Global Climate Attention Index. The index is constructed by taking the equally weighted average across all the countries in our sample.
Top days. We use high-frequency data to identify the impact of climate news shocks on exchange rates and stock returns. To identify innovations in our global index, we initially calculate the daily growth rate in the raw CAI index for each country and subsequently remove its mean. These demeaned indices at the country level are then winsorized at the 99.9% and aggregated to generate a global metric. Based on the distribution of innovations in our global climate index, we select the top 15% of these innovations.
INNOVATIONS TO GLOBAL CLIMATE ATTENTION INDEX (CAI)
Notes: This figure shows innovations to our equal-weights Global Climate Attention Index. Red dots are classified as climate-related news days (top-15% realizations). Weekends are excluded.
Topics for top days. After identifying top-attention days, we examine the content of the news shared during these periods. Following the text analysis literature (see, for example, Hassan et al. (2019)), we employ a pre-trained sentence embedding model to extract semantic representations of newspaper tweets. These embeddings are then clustered to identify a smaller set of key themes, or main topics. This dimensionality reduction enables the use of topic-level cosine similarity scores to classify tweets as climate-related and further distinguish between those primarily associated with physical risk and transition risk. The process involves several carefully designed steps aimed at minimizing discretion on our part, which we detail below.
First, we generate 100 climate risk-related sentences using ChatGPT: 50 focused on physical risks (e.g., extreme weather) and 50 on transition risks (e.g., regulatory changes). Next, we analyze the identified top climate-attention days by preprocessing raw newspaper tweets, removing non-informative elements such as links and special characters. We then convert the cleaned tweets into high-dimensional vector representations (sentence embeddings), which serve as inputs for the BERTopic clustering algorithm (see Grootendorst (2022)).
After identifying the key topics discussed in the media on these top-attention days, we compute cosine similarities between each topic embedding and each AI-generated sentence embedding. This approach allows us to assess how closely each topic aligns with climate-related themes. Specifically, a topic is classified as climate-related if its cosine similarity score with any AI-generated climate sentence falls within the top 0.1% of the score distribution (with a threshold of 0.546). Using the 99.9th percentile ensures a stringent definition of climate-related topics.
Finally, we classify each climate-related topic as either “Physical Risk” or “Transition Risk” based on its closest match to the AI generated sentences. If the most related sentence corresponds to physical (transition) risks, the topic is labeled accordingly.
Sentiment of tweets. We analyze the sentiment of climate-related tweets during top days by using the Twitter-XLM-RoBERTa model from CardiffNLP. This is a transformer-based multilingual sentiment classifier suitable for text in multiple languages (Barbieri et al. 2022). Each one of our tweets is tokenized and passed through the pre-trained model in order to obtain a sentiment label (negative, neutral, or positive) based on the highest logit value.
The below figure presents the share of climate-related tweets in each risk category (physical and transition) that are classified as either positive or negative. In our sample, global news shocks related to climate typically have a negative tone. Equivalently, our top-attention days are usually bad-news days.
SENTIMENT OF CLIMATE RELATED TWEETS DURING TOP DAYS.
Notes: This figure presents the share of climate-related tweets categorized by sentiment (positive or negative) and risk type (transition or physical) on days for which there is a positive spike (top-15%) in the global CAI innovations. The sentiment classification is based on a multilingual sentiment classifier Barbieri et al. 2022).