So far, economic Agent-Based Models (ABMs) have been successful tools to build theories and analyze policies while making model assumptions more realistic. A key feature that distinguishes ABMs from more aggregate economic models is that ABMs explicitly represent the dynamics of micro-components (such as households or firms). This means that the state variables of ABMs can directly be initialized from one or multiple sources of micro-data. We argue that the ABM community has not taken much advantage of this property of ABMs, and believe that a more widespread use of micro datasets in initialization would yield credibility to ABMs and could eventually make them more widely accepted.
In the last ten years, several researchers from different groups have started to build data-driven economic ABMs. In this workshop we would like to connect these researchers and to learn from each other (i) which data do we use in which models? (ii) which new techniques beyond parameter calibration are needed to initialize ABMs with micro-data? (iii) why is it important to initialize ABMs with real-world data? (iv) how, and should, we build a community around data-driven economic ABMs?
What do we mean by “data-driven” Agent-Based Models?
ABMs are mathematical and computational models that explicitly represent the dynamics of units of a system, without imposing any ex-ante equilibrium constraint. ABMs have never existed independently of data: Even the most qualitative ABMs have been formulated with an eye to explaining empirical patterns. More quantitative and realistic ABMs have been using (mostly time-series) data to calibrate parameters, or to establish stylized facts, metrics or summary statistics that the ABM can be validated against.
Here we argue that to make an ABM “data-driven” a more extensive use of data is needed. In particular, we are interested in ABMs that use micro-data to initialize all or some of the state variables of the agents and the environment.
Examples
A recent flurry of papers modeling the economic effects of the Covid-19 pandemic has used input-output tables to initialize linkages between firms or industries, which turned out to be crucial to properly account for the transmission of industry-specific supply and demand shocks induced by the pandemic. There is interest towards building macroeconomic ABMs that not only consider input-output tables, but also use demographic data and Social Accounting Matrices (SAM), to account for the flow of income at the household level and address questions related to inequality.
There is a tradition to build housing market models in which the features of individuals and households are initialized from census data and property values are obtained from real-world transactions and mortgages. These models have been used to explore how the housing market can impact financial stability and respond to climate events.
Some ABMs modeling labor markets have used registry data to initialize the probability that workers transition between jobs and occupations. This initialization makes it possible to study the constraints of changing job in a much more accurate way than the traditional matching function.
Techniques
Initializing ABMs with micro-data is straightforward when all the variables of the model are observed and we initialize the model in the present state making projections about the future. But what if some of the model variables are latent, or if we observe micro-data over multiple time steps and we need to make simulated outcomes compatible with empirical data? Dealing with these issues demands new theoretical tools that go beyond the traditional empirical toolkit of parameter calibration.
Data assimilation, extensively used in weather forecasting and climate science, is a methodology that deals with estimating the latent state of a model using all available information. Data assimilation has recently started to be applied to economic ABMs, making it possible to infer latent variables such as agent beliefs. Making the state variables of the ABM compatible with all observed macro- and micro-data could make them particularly good at forecasting, potentially competing with more standard theoretical approaches such as Dynamic Stochastic General Equilibrium models.
Broader questions
Why is it important to initialize economic ABMs with micro-data? Is it because it makes the model more realistic, and so policy recommendation drawn from the ABM could be more credible? Is it because initializing a model with real-world data could lead to insights that would be missed in a more theoretical setting? Is it useful for forecasting? How do we relate to other communities in economics that have much more experience with using data in economic models, such as the microsimulation and Computable General Equilibrium (CGE) communities? How do we learn from other disciplines, such as epidemiology or ecology, that also make extensive use of micro-data in ABMs and have been developing techniques, communities and software tools to build a community around data-driven ABMs?