Epidemics from the Eye of the Pathogen

References and theoretical details can be found at https://epubs.siam.org/doi/10.1137/21M1450719This is joint work with Faryad Sanheh, Joe Watkins and Joceline Lega

Top: The numerical solution to the SIR dynamical system with N=1000, beta=.4, gamma=.1. Bottom: The epi-curve showing incidence as a function of time for 20 stochastic simulations on fully-mixed (complete) networks.

Epidemics in the Time Domain:

Epidemics are traditionally viewed through the lens of time. Most simply, we can consider an SIR (susceptible-infectious-removed) dynamical system. In this, many simplifying assumptions are made; however, it has been shown to work well for analyzing general epidemic trajectories. A visual of the trajectories can be seen on the left.

The SIR dynamics visualized as a flow chart. Beta encapsulates the infectiousness of the disease, N is the total population size and gamma denotes the recovery rate.

In this model, infection occurs at a rate proportional to the susceptible and infectious population sizes and recovery occurs proportionally to the infectious size. This can be can be thought of as "interaction is necessary for infection." Without infectious or susceptible people, no new infections will occur.

Traditionally, epidemiologist track an epidemic over the course of time. This feels very natural for us, as humans; however, we show that this leads to unnecessary uncertainty when making predictions and analyzing the spread. The bottom plot on the left is of an "epi-curve." It shows the incidence (new cases) as a function of time. This should not be confused with the infectious curve (orange in the top figure).

The problem arises when trying to make predictions based on one curve. The various curves are taken from random simulations of the same disease on the same network. It can be difficult to distinguish between the yellow curve and the others. This is because we analyze the epidemic from the perspective of time.

Changing the perspective: Cumulative Cases and ICC Curve

We start by considering 6000 simulations with the same parameters of those above. We plot their epi-curves on the right. While there is a general shape to them, there is significant spread between each realization. With only one random curve (or an actual epidemic unfolding), it can be difficult to estimate disease characteristics, like infectivity, and when peak incidence would occur.

However, if consider tracking the epidemic in terms of cumulative cases instead of time, all 6000 realizations appear to align very nicely along an average curve. Not only does it bring the curves more in-line but, an equation for the black dotted line is known.

This comes from looking at an epidemic as a random process where either an infection occurs or it does not (recovery). We then count the number of cases per day and can generate the plot to the right. The key finding from this is that the each event (infection or recovery) is independent from all previous events. That means, it does not matter what happened in the past in order to predict the future. This is not the case for analysis from the time-perspective.

Not only does this allow us to make more accurate predictions, but they can be done without the need for numerical simulations or intense computations. Instead, if we know the number of cases, we can predict the number of new cases the next day by doing simple statistical calculations.

This also allows for easy comparison across different populations and locations. Seen below, we simulate the same disease on different size populations. The epi-curves are difficult to distinguish a general pattern; however, when we look at cumulative cases, both scaled, and unscaled, we can see the hidden universality of the spread.

Top: The epi-curves for 6000 simulations of an epidemic on a complete network. R_0 = 2 and N=2500. Bottom: The ICC curves for 6000 simulations scaled according to the final size of the epidemic. All simulations were completed using GEMFsim.

The epi-curve for 1000 simulations for various population sizes with the same disease parameters. No discernable pattern appears from visual analysis.

The ICC curve for 1000 simulations for various population sizes. The universality begins to appear. However, the population size still plays a roll in determining each curve.

The ICC curves for 1000 simulations with various population sizes scaled to their final epidemic size. The clear universality of the disease can be seen across different population sizes.

Why does this matter:

We can apply this to the COVID-19 pandemic. Specifically in Arizona, analyzing the epidemic from the perspective of time, it is difficult to see when public health policy takes effect, whether it has impact, and the general trajectory of the pandemic. However, when we consider the cumulative cases perspective, it can be much easier to extract disease and spread properties.

For Arizona, we can see three distinct waves in 2020. The first occurring from March to May, the second from May to August, and the third, through the end of the year. However, it is difficult to distinguish when these start and stop within the epi-curve.

Not only does identifying waves easier, but, we can see that the R_0 value (basic reproductive number) does not vary greatly between the waves, instead, the number of people potentially exposed to the epidemic grows. At the beginning of the epidemic, only ~50,000 people at immediate risk of contracting the virus. However, after May 15, when the stay-at-home order was lifted, this number grew to ~280,000. Finally, during the third wave, nearly the whole population of Arizona becomes part of the population interacting with the disease.

Top: The epi-curve for Arizona in 2020 with respect to the COVID-19 epidemic. Three dates are highlighted for comparison purposes. Bottom: The ICC curves for Arizona in 2020 with respect to the COVID-19 epidemic. Three clear waves can be seen. The points used for the R_0 and N estimations are given and the estimates for each wave is indicated on the right. These figures were generated by Joceline Lega and can be found in the paper linked above. The data is made available from the COVID Tracking Project by The Atlantic

6000 simulations of an epidemic on a complete network. The red and blue curves denote one and two standard deviations from the mean (black) respectively. Most curves fall within two standard deviations. The distribution at the top of the curve and the end of the curve are of specific interest as they denote the peak incidence rates and the final epidemic size.

Predicting Final Epidemic Size and Peak Incidence Rates

Finally, this method allows for distribution of final epidemic size (a previously known result) and the distribution of incidence at the peak of the theoretical ICC curve. This means that we can give confidence intervals for the final epidemic size based on known data and how bad the epidemic will be at its worst. These calculations can be done without the need for hours of CPU-time in simulations or more complex statistical methods like MCMC. Instead, these calculations can be done in seconds on a laptop PC.

Making accurate predictions during a pandemic can be an unforgiving task, especially when these predictions are used to inform public health. However, the ICC statistical analysis can offer a new perspective for this analysis that, when used in conjunction with the many other highly accurate methods, can be used to make better predictions. No one method should be the end-all-be-all for public health predictions. But, combining various perspectives, from simple (ICC) to complex, can help in the total understanding of epidemic outbreaks and what can be done to reduce their spread.

Google Sites

Report abuse