We are going to develop agent-based models for planning possible SLR scenarios using our expertise in probabilistic modeling, sequential decision making, and dynamic programming. In particular, we will utilize (multi-agent) reinforcement learning, and game theory techniques to model the interactions between different stakeholders.
We consider a simple 3-agent scenario in a city setting as a starting point. The considered agents are
Specifically, the government’s cost function is given by
where n denotes the time steps from the current time x denotes the government’s decision to invest in infrastructure (either 0 or 1), y denotes the residents’ decision to pay additional tax (either 0 or 1), and z denotes the nature’s response as the number of major natural disasters causing significant damage (0,1,2,...). Furthermore, we define the discount factor a (between 0 and 1) as the government’s cooperation index. In particular, small a corresponds to a non-cooperative government which heavily discounts (i.e., disregards) the future costs. When an infrastructure investment is made, the government faces a standard cost of 0.1. There is a standard cost of 2 units for each large damage caused by a natural disaster. If the residents agree to pay extra taxes, then half of the disaster cost is covered by them. Otherwise, the entire disaster cost is charged to the government.
The responses x, y and z are modeled using the state of infrastructure, which is the sum of x's over time, and the state of the sea level, which is the sum of sea level rise amounts over time. We model the sea level rise amount at each time with the gamma distribution, and the residents' decision with the Bernoulli distribution whose probability parameter is a function of infrastructure state, sea level state, and residents' cooperation index. Non-cooperative residents (denoted by a small index close to 0) have small probability of paying extra tax to support government against SLR. On the contrary, highly cooperative residents (denoted by a highly positive index) are very likely to help government by paying extra taxes. According to the assumed model, residents’ willingness to contribute is triggered by the seriousness of the government in taking action against SLR (reflected in the cumulative infrastructure state), as well as the severeness of SLR (reflected in the sea level state). The nature’s response is modeled using Poisson distribution, which is typically used to model the occurrence of events as a collection of many random factors with small probabilities. That expected waiting period for a disaster is directly proportional to the readiness of the infrastructure (i.e., infrastructure state) and inversely proportional to the amount of SLR (i.e., sea level state).
The optimum policy for government’s decisions that minimizes the expected cost
defines a Markov decision process (MDP) and can be efficiently solved through Bellman’s equation in dynamic programming. Specifically, the optimum policy chooses the investment action that minimizes the expected cost at each time step n. In this preliminary study, we are able to analytically show that the optimum decision rule is a thresholding on the sea level state depending on the infrastructure state. As shown in Figure below, this can be illustrated as building a wall against rising sea level by adding a level of bricks (i.e., infrastructure improvement) when the sea level reaches a certain threshold (i.e., red line in Figure below).
The threshold is determined by the cooperation coefficients of the government and the residents. Particularly, the higher cooperation coefficients are, the lower threshold becomes. This holds especially for the government’s cooperation coefficient. Intuitively, as the government's cooperation coefficient grows, the government becomes more cautious about (i.e., sees more objectively without discounting) the expected future costs of not improving the infrastructure against SLR and sets a lower threshold for investment. On the other hand, small cooperation coefficient implies underestimated future costs, and thus overemphasized investment costs, which results in a high threshold for investment.
Given the cooperation coefficients the government’s investment decision is made for each system state pair (infrastructure state and sea level state) according to the optimum policy. In terms of the expected objective cost function
without any discount for future costs, different community prototypes are compared by changing the cooperation coefficients of government and residents. In Figure below, the results clearly show that cooperation among stakeholders enables orders of magnitude decrease in the expected objective future cost.
Leveraging our initial model we are going to investigate more complex scenarios with additional stakeholders. Mainly we are going to utilize two modeling approaches: single-agent-focused modeling and multi-agent-focused modeling. In the former, similar to the initial model discussed in Preliminary Work, the optimum decision policy for a specific stakeholder will be investigated by using appropriate probabilistic models for the decisions of other stakeholders.
This approach will be useful for targeting a specific stakeholder, e.g., generating targeted reports for specific stakeholders (see Community Engagement page). Several single-agent reinforcement learning (RL) techniques will be used to that end, as discussed below. In the latter, the decision policies of multiple stakeholders will be considered simultaneously using multi-agent RL and game theory tools. Although multi-agent models are more general and realistic, the single-agent RL models are handier especially when focusing on a specific stakeholder as there are many more practical algorithms available for single-agent RL.