5 major metropolitan areas: Seattle, Los Angeles, Chicago, New York, Atlanta
Metropolitan Area Definition: Includes the county that contains the downtown of the city itself as well as the counties along the perimeter of this downtown county to account for the daily commutes of workers into and out of the downtown area.
Overarching Process of Construction
County populations scaled down so one node represents group of individuals; parameters include scale factor
County level network created with Watts-Strogatz graph; parameters include K nearest neighbors and rewiring probability
Greater metropolitan area level network created by edges between counties; parameters include blue edge probabilities and red edge probabilities
National level network created by edges between greater metropolitan areas; parameters include flight data probabilities
SEIR Model
Beta: β is the probability that an infected person will infect a susceptible person. This probability was determined through the use of the reproductive number equation, which states that R0= βk / μ.
R0 is the reproductive number, or average number of secondary infections caused by a single individual in a fully susceptible population. According to the World Health Organization, R0 is predicted to be between 2 and 2.5 [1]. For the purpose of computing beta, we took the average of this range, which is 2.25.
k is the average number of people that an infectious individual is connected to. The POLYMOD study in Europe examined on average how many people an individual comes into contact daily. The value of k was estimated to be 13.4 individuals [2].
μ is the recovery rate, which was computed by calculating 1-d, where d is the death rate. The death rate is equivalent to the number of reported deaths divided by the total number of reported cases, which according to the Wall Street Journal, amounts to 39,095 / 735,366 = .053 in the United States [3]. So μ = 1-.053= .947.
Putting this together, β = R0μ / k = 2.25*.947 / 13.4 = .1590 ≈ 16%.
Node Scale Factor: In theory, each node would represent a single person in the model. However, since computation time increases with the addition of each node and there would be about 91.7 million nodes in the network, the theoretical implementation would be computationally expensive, making it difficult to build and perform experiments on the network. As a result, the decision to represent 5,000 people as one node largely stems from the fact that it takes a relatively short computation time to construct the network with this scale factor.
Maximum Iterations: The maximum number of iterations, or days, that the model runs for was set to ensure that most nodes in the network have been removed in order to study the propagation of the virus from start to finish.
National Level
Metropolitan Edge Probabilities: Determined from flight data discussed in more detail below under the Data - Metropolitan Area Probabilities section.
Metropolitan Area Level
Blue and Red Edge Probabilities: The blue edge probability was determined to be lower compared to that of red edges because there tends to be less traffic between two counties surrounding a city area compared to a county that contains a city and a county that does not, since more people travel to cities for work and entertainment. The process of finding these probabilities was executed via trial and error and the current values were chosen because they best mimic the true propagation of the virus. Due to project time constraints, these two probabilities are uniform across all counties and metropolitan areas (the Data - County Populations section includes more information and visualizations).
County Level
K Nearest Neighbors: An edge exists between every node and its K nearest neighbors, K/2 on each side of the ring lattice structure of a Watts-Strogatz graph. Since our scale factor is 5,000 individuals to 1 node, we set K=2 because an individual in a group of 5,000 may know another individual in a group of 5,000 nearby, but is much less likely to know someone another 5,000 individuals away.
Rewiring Probability: Every "neighbor" edge in the ring lattice has a probability of being rewired. Being rewired means maintaining the source node of the edge, and then choosing new target node uniformly at random. We set rewire_prob = 0.10 to account for the instances in which an individual in group of 5,000 comes in contact with an individual in another group of 5,000 that is "across the county" from them, and not necessarily their immediate neighbor.
Question: How is population data used to create county level networks? How are these networks being connected to one another to represent metropolitan areas?
Data Source: SimpleMaps United States Cities Database
Data Selection: The database provides the population and county location of all U.S. Census-recognized cities and towns. Population of all cities in the selected counties of each greater metropolitan area was gathered. The data was last updated on September 11, 2019.
Calculation Process: Populations of all cities within each county were summed to give final county level populations.
County Selections: Maps of green and yellow highlighted counties of the five greater metropolitan areas. Green counties contain the downtown metropolitan area itself, yellow counties border this central green county. Blue and red arrows reflect the edge probabilities between counties.
Seattle
Los Angeles
Chicago
New York
Atlanta
Question: What is the probability that an individual would travel between any two metropolitan areas?
Data Source: Airline Origin and Destination Survey (DB1B)
Data Selection: The DB1B data provides a 10% sample of all domestic airline tickets on a quarterly basis and separates data by ticket, market, and coupon characteristics. 2019Q1 coupon data was used to estimate metropolitan area probabilities for two reasons. First, since the virus began its spread in 2020Q1 and airline data for this quarter was not yet available, 2019Q1 data was used to proxy for airline routes and passenger travel that might occur during this time. Second, unlike market and ticket level data, coupon data provides information regarding individual flights between origin and destination pairs, thereby offering the relevant data points required to calculate the probability of travel between any two metropolitan areas.
Calculation Process:
The data was aggregated by unique origin and destination airport pairs to gather the total number of passengers flying between any two airports in 2019Q1.
Airports present in selected metropolitan areas were identified and used to filter the data such that it only included airports located in one of the chosen metropolitan areas.
The total number of passengers travelling from any metropolitan area A to any metropolitan area B was found by summing traffic moving from any origin airport located in area A to any destination airport located in area B.
The total population of each metropolitan area was calculated by summing the populations of the counties that constitute that area.
The probability of travelling from any metropolitan area A to any metropolitan area B was determined by dividing the total number of passengers travelling from area A to B by the total population of origin metropolitan area A.
The probability of travelling between any two metropolitan areas A and B then equals to the sum of the probability of travelling from A to B and the probability of travelling from B to A.
Combining these hierarchical contact networks based off of flight data, geographic proximity, and social contact patterns, we created a 18,361 node network using the networkx package. On this network we simulate the spread of SARS-CoV-2 across these 5 metropolitan areas by selecting a random patient 0 in the Seattle metropolitan area which results in their entire node being exposed. Using the SEIR model explained above, this node exists in the exposed state for 5 time steps, and then gets 5 chances (the length of time which it is in the infected state) to infect its neighboring nodes with probability beta. On a weekly basis (7 time steps), we record the number of susceptible, exposed, infectious, and removed nodes. Total infected nodes is calculated as the sum of nodes in exposed, infectious, and removed states. We also note the arrival time for each metropolitan area which is defined as the time step in which the first case becomes exposed in that metropolitan area. See diagram below as an example output of the simulation being run for 50 days on a smaller dataset not representative of the true populations and probabilities.
Question: How do we reflect measures to reduce contacts in the network?
Over the past three months, we've seen various degrees of intervention to try to "flatten the curve" of COVID-19 cases. We attempt to reflect these interventions with the following three travel restrictions:
National Level:
The drastic reduction of air traffic means little contact exists between the greater metropolitan areas, so edges between the 5 areas are completely removed with a 1.00 probability. National edge removal will be tested at Day 7, Day 14, and Day 21.
County Level:
Another layer of intervention was that of businesses encouraging their employees to work from home. Since it is not the case that all employed people are able to work from home, we will remove edges that exist between counties with only a 0.85 probability. County edge removal will be tested in conjunction with the national restrictions, starting 7 days after the implementation of the respective national restriction. Therefore, county edge removal will be tested at Day 14, Day 21, and Day 28.
Community Level:
The last layer of intervention was state governors issuing orders for residents to shelter in place. All non-essential employees were ordered to stay at home, and only leave their homes to get groceries, medications, and other essential needs. Studies show 30% of employed people in the US are categorized as essential workers, therefore we will remove edges that exist within counties with a 0.70 probability [4, 5]. Community edge removal will be tested in conjunction with the national and county restrictions, starting 7 days after the implementation of the respective county restriction. Therefore, community edge removal will be tested at Day 21, Day 28, and Day 35.
Next: Results & Discussion
References
[2] https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050074
[3] https://www.wsj.com/articles/coronavirus-latest-news-04-19-2020-11587289731
[4] https://www.nytimes.com/2020/04/18/us/coronavirus-women-essential-workers.html
[5] https://tradingeconomics.com/united-states/employed-persons