The key data is the weekly hospitalization rate per 100,000 individuals. This is the data that attendees will build and calibrate their model on and make predictions for forward in time.
Data will be released in chunks, typical covering about 8 weeks for the first release, then 6 weeks for the subsequent releases. The data file will contain a header row and two columns. The columns will be comma separated. The first column will be the week and the second column the number of hospitalizations per 100,000 people. After each release, Workshop attendees will generate their own file with their predictions for the dates in the data release file plus predictions for four weeks into the future.
After data is released, teams will make a prediction of the new hospitalization rate per 100,000 people and upload to the Workshop's GitHub repository their predictions. The prediction file format is the same as the data submission format; a header row then one line each for the week and their predicted hospitalization rate. Optionally, the team may include an addition column with an estimated Rt value.
Note that an Rt prediction is only needed for the weeks being predicted, though it can be included on the lines for all weeks.
In this example, 6 weeks of hospitalizations data is released, shown as the black line with open circles. Each team makes a prediction of the hospitalizations three weeks into the future, shown as the red dots. A team will likely generate data points for each data release point, which they may submit. But those data points do not contribute to the scoring.
Once the teams have submitted their first prediction, the second set of hospitalization data is released. As the teams create a new four week prediction the judges will score the previous submissions versus the newly released data.
The judges will calculate the Root Mean Square Error (RMSE) for each teams' four prediction data points. In the equation pi is the team's prediction for week i, ai the actual value for the week and n is the number or weeks forward that are being predicted.
Using the newly released data, the teams submit their next four week prediction.
Once the teams have submitted their second prediction, the third set of hospitalization data is released. As the teams create a new four week prediction the judges will score the previous submissions versus the newly released data.
Using the newly released data, the teams submit their next four week prediction.
Once the teams have submitted their third prediction, the final set of hospitalization data is released. The judges will score the final submission.
The judges will then calculate the total RMSE for each team across all the submissions. That sum will be the team's final score for the challenge.
NOTE: Unlike the example data given above, the actual challenge data may include more than one peak in the weekly new hospitalizations data.
The Effective Reproduction Number (Rt, also known as Re), tracks the average number of secondary cases caused by a single infected person at a specific point in time t. Rt fluctuates constantly; it drops when people get vaccinated, wear masks, practice social distancing, or develop natural immunity.
The Basic Reproduction Number (R0, pronounced "R-naught") is the average number of people one infected person will spread a disease to in a completely naive population (no prior immunity, no vaccines, and no interventions). R0 is a fixed characteristic of an infectious agent and does not change over time. R0 is primarily used at the very beginning of an outbreak to determine how inherently transmissible a pathogen is.
Both values use the same threshold to measure whether an epidemic is growing or shrinking:
>1.0: The outbreak is growing (each person infects more than one other person).
=1.0: The disease is endemic and stable (each person infects exactly one other person).
<1.0: The outbreak is shrinking and will eventually die out (each infected person infects less than one other).
For more information on R0 and Rt, see Behind the Model: CDC's Tools to Assess Epidemic Trends, and Jewell and Lewnard 2021.
In a simple epidemic model (like the Susceptible-Infectious-Recovered or SIR model), the baseline relationship is:
Rt =R0*s(t)
Rt: The effective reproduction number at time.
R0: The basic reproduction number.
s(t): The fraction of the population that is susceptible to infection at time t (expressed as a decimal between 0 and 1).