InvestESG: A Multi-Agent Reinforcement Learning Benchmark for Studying Climate Investment as A Social Dilemma
Xiaoxuan Hou*, Jiayi (Carrie) Yuan*, Joel Z. Leibo, Natasha Jaques
xxhou@uw.edu, jiayiy9@cs.washington.edu, jzl@google.com, nj@cs.washington.edu
University of Washington
TLDR: We introduce InvestESG, a lightweight, GPU-efficient MARL environment simulating company and investor responses to ESG disclosure mandates, with companies and investors modeled as two types of selfish PPO agents. InvestESG predicts behaviors consistent with empirical evidence, highlighting MARL’s potential to complement traditional methods for policy and market design.
InvestESG is a novel multi-agent reinforcement learning (MARL) benchmark designed to study the impact of Environmental, Social, and Governance (ESG) disclosure mandates on corporate climate investments. The benchmark models an intertemporal social dilemma where companies balance short-term profit losses from climate mitigation efforts and long-term benefits from reducing climate risk, while ESG-conscious investors attempt to influence corporate behavior through their investment decisions. Companies allocate capital across mitigation, greenwashing, and resilience, with varying strategies influencing climate outcomes and investor preferences. Our experiments show that without ESG-conscious investors with sufficient capital, corporate mitigation efforts remain limited under the disclosure mandate. However, when a critical mass of investors prioritizes ESG, corporate cooperation increases, which in turn reduces climate risks and enhances long-term financial stability. Additionally, providing more information about global climate risks encourages companies to invest more in mitigation, even without investor involvement. Our findings align with empirical research using real-world data, highlighting MARL's potential to inform policy by providing insights into large-scale socio-economic challenges through efficient testing of alternative policy and market designs.
Environment Setup
Corporations: choose how much to invest in mitigating emissions, which affects their ESG Score. Aims to maximize profits.
Investors: choose investment portfolios to maximize utility, which depends on the financial performance and the ESG score of the portfolio.
Climate risk over the course of a single 100 year episode with little mitigation.
Climate Change as A Social Dilemma
Schelling diagrams below demonstrate that the environment consitutes a social dilemma by default (Figure a) and having ESG score does not alleviate the situation (Figure b). Having increasingly more ESG-conscious investors incentivizes mitigation efforts and can potentially solve it (Figure c). However, being able to greenwash may convert the environment back into a social dilemma (Figure d).
The graphs compare payoffs between cooperation (mitigation, blue lines) and defection (no mitigation, red lines) for a focal company, given varying number of other cooperating companies. Yellow lines represent the average payoff across all companies when the focal company defects.
(a) 0/3 Conscious Investors
(b) 2/3 Conscious Investors
(c) 3/3 Conscious Investors
(d) 3/3 Conscious Investors
& Greenwashing