Challenge Format

GitHub modeling-the-invisible-workshop

You should have received an email to a Google folder where you will upload your predictions.
https://drive.google.com/drive/folders/1je1fk0lby9QufwFYZPfkSkVTHeirOFYh

Data Release Format

The key data is the weekly hospitalization rate per 100,000 individuals. This is the data that attendees will build and calibrate their model on and make predictions for forward in time.

Data will be released in chunks, typical covering about 8 weeks for the first release, then 6 weeks for the subsequent releases. The data file will contain a header row and two columns. The columns will be comma separated. The first column will be the week and the second column the number of hospitalizations per 100,000 people. After each release, Workshop attendees will generate their own file with their predictions for the dates in the data release file plus predictions for four weeks into the future.

Team Data Submission Format

After data is released, teams will make a prediction of the new hospitalization rate per 100,000 people and upload to the Workshop's GitHub repository their predictions. The prediction file format is the same as the data submission format; a header row then one line each for the week and their predicted hospitalization rate. Optionally, the team may include an addition column with an estimated Rt value.

Note that an Rt prediction is only needed for the weeks being predicted, though it can be included on the lines for all weeks.

1st Data Release and Team's Prediction

In this example, 6 weeks of hospitalizations data is released, shown as the black line with open circles. Each team makes a prediction of the hospitalizations three weeks into the future, shown as the red dots. A team will likely generate data points for each data release point, which they may submit. But those data points do not contribute to the scoring.

1st Scoring and 2nd Data Release

Once the teams have submitted their first prediction, the second set of hospitalization data is released. As the teams create a new four week prediction the judges will score the previous submissions versus the newly released data.

Error Calculation

The judges will calculate the Root Mean Square Error (RMSE) for each teams' prediction data points. In addition, the RMSE for each team's Rt (effective reproduction numbers) is also calculated. In the equation pi is the team's prediction for week i, ai the actual value for the week and n is the number or weeks forward that are being predicted.

The two scores are combined to create a "round_score" using;

round_score=(HOSP_WEIGHT*hosp_nrmse)+(R0_WEIGHT*r0_rmse)

2nd Data Release and Team's Prediction

Using the newly released data, the teams submit their next four week prediction.

2nd Scoring and 3rd Data Release

Once the teams have submitted their second prediction, the third set of hospitalization data is released. As the teams create a new four week prediction the judges will score the previous submissions versus the newly released data.

3rd Data Release and Team's Prediction

Using the newly released data, the teams submit their next four week prediction.

3rd Scoring

Once the teams have submitted their third prediction, the final set of hospitalization data is released. The judges will score the final submission.

Final Scoring

The judges will then calculate the total RMSE for each team across all the submissions. That sum will be the team's final score for the challenge.

NOTE: Unlike the example data given above, the actual challenge data may include more than one peak in the weekly new hospitalizations data.

Rt Estimate

The Effective Reproduction Number (Rt, also known as Re), tracks the average number of secondary cases caused by a single infected person at a specific point in time t. Rt fluctuates constantly; it drops when people get vaccinated, wear masks, practice social distancing, or develop natural immunity.

Rt versus R0

The Basic Reproduction Number (R0, pronounced "R-naught") is the average number of people one infected person will spread a disease to in a completely naive population (no prior immunity, no vaccines, and no interventions). R0 is a fixed characteristic of an infectious agent and does not change over time. R0 is primarily used at the very beginning of an outbreak to determine how inherently transmissible a pathogen is.

Interpretation of Rt and R0

Both values use the same threshold to measure whether an epidemic is growing or shrinking:

>1.0: The outbreak is growing (each person infects more than one other person).
=1.0: The disease is endemic and stable (each person infects exactly one other person).
<1.0: The outbreak is shrinking and will eventually die out (each infected person infects less than one other).

For more information on R0 and Rt, see Behind the Model: CDC's Tools to Assess Epidemic Trends, and Jewell and Lewnard 2021.

In a simple epidemic model (like the Susceptible-Infectious-Recovered or SIR model), the baseline relationship is:

Rt =R0*s(t)

Rt: The effective reproduction number at time.
R0: The basic reproduction number.
s(t): The fraction of the population that is susceptible to infection at time t (expressed as a decimal between 0 and 1).

Prediction submission format with Rt estimate

Page updated

Report abuse