Thrust 1: Computational Scenario Planning

We are going to develop agent-based models for planning possible SLR scenarios using our expertise in probabilistic modeling, sequential decision making, and dynamic programming. In particular, we will utilize (multi-agent) reinforcement learning, and game theory techniques to model the interactions between different stakeholders.

Preliminary Results

We consider a simple 3-agent scenario in a city setting as a starting point. The considered agents are

the government responsible for making investment decisions for infrastructure improvements against SLR,
the residents who decides whether or not to pay extra taxes to support the government’s infrastructure improvement program and recovery from major natural disasters, and
the nature which at each time step either causes or not causes, through a major natural disaster (e.g., hurricane) and depending on the current state of the infrastructure, a significant damage to the city that brings about large recovery costs.

Specifically, the government’s cost function is given by

where n denotes the time steps from the current time x denotes the government’s decision to invest in infrastructure (either 0 or 1), y denotes the residents’ decision to pay additional tax (either 0 or 1), and z denotes the nature’s response as the number of major natural disasters causing significant damage (0,1,2,...). Furthermore, we define the discount factor a (between 0 and 1) as the government’s cooperation index. In particular, small a corresponds to a non-cooperative government which heavily discounts (i.e., disregards) the future costs. When an infrastructure investment is made, the government faces a standard cost of 0.1. There is a standard cost of 2 units for each large damage caused by a natural disaster. If the residents agree to pay extra taxes, then half of the disaster cost is covered by them. Otherwise, the entire disaster cost is charged to the government.

The responses x, y and z are modeled using the state of infrastructure, which is the sum of x's over time, and the state of the sea level, which is the sum of sea level rise amounts over time. We model the sea level rise amount at each time with the gamma distribution, and the residents' decision with the Bernoulli distribution whose probability parameter is a function of infrastructure state, sea level state, and residents' cooperation index. Non-cooperative residents (denoted by a small index close to 0) have small probability of paying extra tax to support government against SLR. On the contrary, highly cooperative residents (denoted by a highly positive index) are very likely to help government by paying extra taxes. According to the assumed model, residents’ willingness to contribute is triggered by the seriousness of the government in taking action against SLR (reflected in the cumulative infrastructure state), as well as the severeness of SLR (reflected in the sea level state). The nature’s response is modeled using Poisson distribution, which is typically used to model the occurrence of events as a collection of many random factors with small probabilities. That expected waiting period for a disaster is directly proportional to the readiness of the infrastructure (i.e., infrastructure state) and inversely proportional to the amount of SLR (i.e., sea level state).

The optimum policy for government’s decisions that minimizes the expected cost

defines a Markov decision process (MDP) and can be efficiently solved through Bellman’s equation in dynamic programming. Specifically, the optimum policy chooses the investment action that minimizes the expected cost at each time step n. In this preliminary study, we are able to analytically show that the optimum decision rule is a thresholding on the sea level state depending on the infrastructure state. As shown in Figure below, this can be illustrated as building a wall against rising sea level by adding a level of bricks (i.e., infrastructure improvement) when the sea level reaches a certain threshold (i.e., red line in Figure below).

The threshold is determined by the cooperation coefficients of the government and the residents. Particularly, the higher cooperation coefficients are, the lower threshold becomes. This holds especially for the government’s cooperation coefficient. Intuitively, as the government's cooperation coefficient grows, the government becomes more cautious about (i.e., sees more objectively without discounting) the expected future costs of not improving the infrastructure against SLR and sets a lower threshold for investment. On the other hand, small cooperation coefficient implies underestimated future costs, and thus overemphasized investment costs, which results in a high threshold for investment.

Given the cooperation coefficients the government’s investment decision is made for each system state pair (infrastructure state and sea level state) according to the optimum policy. In terms of the expected objective cost function

without any discount for future costs, different community prototypes are compared by changing the cooperation coefficients of government and residents. In Figure below, the results clearly show that cooperation among stakeholders enables orders of magnitude decrease in the expected objective future cost.

Future Work

Leveraging our initial model we are going to investigate more complex scenarios with additional stakeholders. Mainly we are going to utilize two modeling approaches: single-agent-focused modeling and multi-agent-focused modeling. In the former, similar to the initial model discussed in Preliminary Work, the optimum decision policy for a specific stakeholder will be investigated by using appropriate probabilistic models for the decisions of other stakeholders.

This approach will be useful for targeting a specific stakeholder, e.g., generating targeted reports for specific stakeholders (see Community Engagement page). Several single-agent reinforcement learning (RL) techniques will be used to that end, as discussed below. In the latter, the decision policies of multiple stakeholders will be considered simultaneously using multi-agent RL and game theory tools. Although multi-agent models are more general and realistic, the single-agent RL models are handier especially when focusing on a specific stakeholder as there are many more practical algorithms available for single-agent RL.

Reinforcement Learning (RL): To be able to address complex modeling issues and to obtain approximate optimum policies where the exact optimum policies are not attainable, we will investigate different RL structures, such as active learning (e.g., a stakeholder could obtain limited observations, hence at each time selects what to observe), online learning (e.g., a stakeholder may want to balance between exploring new strategies and exploiting the known strategies), Q-learning for approximate optimum policy finding in complex scenarios, and generalized policy iteration methods (e.g., actor-critic method, where actor is a stakeholder such as government or a company, and critic is a consulting company)
Multi-agent RL & Game Theory: We will formulate the SLR scenarios in game settings, and seek solutions using multi-agent RL and stochastic game theory techniques. For example, for the initial 3-agent scenario, we will study the real multi-agent case in which the decision of residents is governed by a cost function specific to resident dynamics, similar to that given for government. We anticipate that those results will reflect the similar conclusions reached by the random modeling of resident decisions about the importance of cooperation, albeit with significantly more complex analysis. Specifically, it may not be possible to analytically identify the optimum policy, e.g., as a thresholding rule, due to the interactions between the government’s and residents’ decision making procedures. In this case, the (approximately) optimum policies may be computed through extensive Monte Carlo simulations for each system state pair. An example to the Game Theory techniques that we will consider is the stochastic Stackelberg game, in which a leader, such as the government, drives the actions of a number of followers, e.g., the other stakeholders. As opposed to the abundant literature on Stackelberg game framework, in our work there will be conflicting criteria and objectives at the government and a set of heterogeneous stakeholders. Our collaborator Dr. Walid Saad, Associate Professor of Electrical and Computer Engineering at Virginia Tech, has considerable amount of works on Stackelberg games and related ideas.
Multimodal Data Fusion: We will also explore integrating heterogeneous data for RL methods by building upon PI Yilmaz’s experience in Multimodal Data Fusion. Specifically, we will aim at fusing disparate types of historical data from stakeholders, such as insurance premiums, bird counts, hurricane magnitudes, duration, path, damage and causalities, SLR amounts, etc., to discover patterns, and model interactions and cost functions.
Computational Efficiency: As we include new stakeholders in the scenario planning model (e.g., a business stakeholder, an environmental stakeholder, a social stakeholder, etc.), due to the complex web of interactions, finding (approximately) optimum policies for each stakeholder will call for computationally extensive simulations, efficient models and computing architectures. To that end, we will rely on PI Yilmaz’s expertise on sequential decision making, and developing computationally efficient probabilistic models and inference algorithms for complex problems, as well as Senior Personnel Chang’s expertise on big data architectures and cloud computing.
Fine-grained Modeling: We will benefit from our collaborations with stakeholders to understand the characteristics of interactions between stakeholders, and to accordingly model the cost functions for stakeholders. Moreover, we plan to collaborate with social scientists to understand the details of cooperation coefficients, and to be able to model them by decomposing into a number of fundamental traits, such as political, ethical, sociological, and religious traits, of governments and residents. Specifically, we plan to collaborate with our colleagues in the Anthropocene Working Group at USF, which we are a part of. Together with Dr. Cheryl Hall, Associate Professor of Political Theory at the School of Interdisciplinary Global Studies, we will study the emotional and political aspects of people and governments caring about or ignoring SLR; and together with Dr. Martin Schonfeld, Professor of Philosophy, whose area of specialization is climate philosophy and ethics, we will investigate the impacts of climate ethics and religions on the decisions of governments and residents about SLR. Through such transdisciplinary research we hope to discover the underlying factors of cooperation against SLR for different stakeholders, and to use such findings to come up with fine-grained computational models.