Title: Curating a COVID-19 data repository and forecasting county-level death counts in the United States: A data perspective
Abstract:
As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. However, inherent in forecasting models is their reliance on data availability and more importantly, data quality. In this talk, we present our continuous curation of a large data repository containing COVID-19 information from 20+ public sources. At the hospital level, our data includes the location of the hospital, the number of ICU beds, the total number of employees, and the hospital type. At the county level, our data includes reported COVID-19 cases and death counts, automatically updated every day, as well as demographics, socioeconomic information, health resource availability, health risk factors, and social mobility. We will also discuss important considerations and challenges faced when creating this data repository, particularly with respect to ensuring the highest data quality possible for the broader community in our fight against COVID-19. Our full data repository along with extensive documentation is open-source and publicly available on GitHub at https://github.com/Yu-Group/covid19-severity-prediction.