Data Science in Epidemiology - Details and Results

Data

SafeGraph Data

It contains the number of user visits from census tract to POI. Also, this data is aggregated over each week. It is to be used in the creation of the metapopulation graph. Contains 70k+ POIs, 200k+ trip pairs each week and is collected over 66 weeks

NYC Health Data

It contains the case-rate data of each New York zip-code collected weekly. It is to be used as ground truth for the model prediction

Data Cleaning

The key data-cleaning process involves creating a mapping from census tract to zip code tabulation area using a combined ratio.

Model Architecture

ZPMNet (Zip Code Tract- POI Mobility Network)

The main idea behind our architecture is to learn a good representation of dynamics for each of the POIs and ZCTAs and leverage these representations along with visit counts to forecast future case rate as function of aggregated effect of mobility on each ZCTA from each POI.

ZPMNet

Model Architecture

Results

We mention some common baselines along with reported accuracies (for r weeks ahead) in the below table. We see the predictions also in the plot as seen below for 8 randomly selected ZCTAs.

Performance of ZPMNet compared to ground truth

We observe that ZPMNet is the significantly better than all baselines. In particular we observe 52%-220% average improvements. As we forecast farther into the future the performance of baselines quickly degrade faster than our model's.

We also observed that for over 63% of all ZCTAs over model performs over 200% time better with predictions of other models failing to capture the general trends of ZCTA case rates

Hotspot Detection

To detect hotspots, we consider a weight as defined below (Following naming conventions defined in the architecture). Essentially, this quantity models each POI's weight contribution to each region. A POI with a higher quantity is one that has effectively caused more infections.

The top 10 POI values are seen as mentioned in the below table. We observe that most of them are dining spots which have large frequency of airborne transition whereas other POIs include clubs and schools.

Page updated

Google Sites

Report abuse