Data

Network data

Synthetic networks
- Homogeneous networks:
  - Complete graph on 1000 nodes
  - Regular graph on 1000 nodes with degree=4
- Heterogeneous networks:
  - Varying degree distribution: Graph on 1000 nodes with mean degree = 4 and variances in {4.46, 8.76, 30.35}
  - Varying clustering coefficient: Small-world networks on 1000 nodes with clustering coeffecients in {0.05, 0.34, 0.49}
Real-world networks
- AS-Oregon router networks (SNAP Stanford):

This temporal network of routers comprising the Internet is grouped into subgraphs called Autonomous Systems (AS). This dataset contains 9 snapshots of the network each taken once a week from March 31st, 2001 to May 26th 2001. We chose to use the network with the largest number of nodes (≈ 11k nodes); hence, after analysis, we chose the network of May 26th and considered it a static graph for our calibration. However, the loss value was extremely large and the optimization never converged. This is reasonable since we try to calibrate the parameters using human contact and infection data on a contact network of routers, for which the contact patterns will be vastly different. So, we next looked for human contact networks.

- Student-teacher-staff contact network from a US High School(Infectious disease spread):

This network represents contact patterns among students, teachers, and staff in a high school in the USA. This was constructed to trace the flu spread due to proximity interactions. This network perfectly fits our needs for the following reasons it is a reasonably large network (788 nodes, 118k edges), it models influenza spread and in the US. So, this network is closely related to our time series data and thus, we calibrated this dataset.

Refer to Network Visualisation section to look at the various networks.

Time series data

COVID 19 data with geography (CDC):

We obtained the weekly COVID cases till July 2024 with state level granularity. However, since our focus was to look at the interaction of the different strains of a virus, this dataset was not sufficient given that this did not give strain-specific information.

Dengue phylogenetics (CDC, Nextstrain):

We obtained the weekly dengue cases across the 50 US states between 2010 and 2013, and the phylogenetic tree for the various dengue strains that persisted during this period. However, due to lack of information to map the strain to the emergence of the case, we did not carry on with this dataset.

Influenza strains (WHO):

This dataset by WHO provides the statistics of the weekly number of tests out of which how many tested positive for influenza (along with information of which of the strains A or B caused it) and how many tested negative. We obtained the data for the whole of the US starting from Oct 2016 to May 2024.

Page updated

Google Sites

Report abuse