What is regression to the mean?

All roadways have an element of risk based on their physical characteristics. However, there are underlying random variables associated with crashes creating “noise” that can not be predicted. This noise is part of what makes crashes rare and random events, or in other words, fluctuating extremes around the expected average. Regression to the mean occurs when a random variable is extreme in a single measurement but closer to the mean previous or future measurements (Dilipkumar, 2020).

Figure 1 depicts a common illustration used by the Federal Highway Administration (FHWA) to illustrate regression to the mean (https://safety.fhwa.dot.gov/hsip/resources/fhwasa09029/sec2.cfm)

How Regression to the Mean May Overestimate Safety Benefits

Typically high crash frequencies (those above the expected average) attract the attention of analysts. If the extreme event is used for the “before installation observation”, then the “after installation observation” might lead to an overestimation of the safety benefit.


Example 1: How Regression to the Mean Impacts Benefit Calculations

A signal controlled intersection attracts the attention of an analyst because there were two suspected serious injury left-turn crashes in a single year. A traffic study is conducted and determines that an appropriate mitigation is to change the left-turn approaches on the major route from permissive signal phasing to permissive-protected phasing. The phasing change is implemented, crashes are monitored for a year following the installation, and no more serious injury left-turn related crashes are observed. Success? Not exactly.


It turns out that the initial observation of two serious injury crashes in a single year was made during an extreme event, which was very unlikely to happen again. The second observation following the signal phasing change was made during a time period where the performance “regressed” back towards the mean or “normal” condition. As such, the conclusion that the safety concern was mitigated with the phase change was incorrect.


In reality, with regression to the mean it is very unlikely that extreme events will happen during successive observations. In other words, another serious injury left-turn related crash would likely not occur in the subsequent year if no changes were made to the signal phasing.


Example 2: How Regression to the Mean Impacts Crash Projections

Continuing with the signal controlled intersection example, two suspected serious injury left-turning crashes at a signal controlled intersection in the previous 3-years does not mean that a suspected serious injury left-turn related crash will regularly occur about every 1.5 years.


When basing the safety benefit on a 3-year observed crash frequency, it is assumed that the 3-year observation represents a long-term average. In most cases this is not true. Crash events do not follow normal distributions. This is especially true for suspected serious injury and fatal crashes. These extreme crash events are described as rare and random events.


Modeling crash events accurately requires statistical methods that are more advanced than simple averages and standard deviations. The regression to the mean phenomenon can cause analysts to over/underestimate safety impacts unless it is accounted for.

Overcoming Regression to the Mean

Crash modeling overcomes regression to the mean by looking at multiple similar sites and employing special statistical techniques to account for regression to the mean. When possible, crash modeling should employ the following techniques:

  1. Predictive modeling methods are preferred over historic modeling methods alone.

  2. When possible, predictive modeling methods should be weighted with the observed crashes to obtain the expected crash frequency.

  3. Expected crashes should be separated into severity distributions for the specific sites being analyzed.

  4. Crash costs should be applied to the crash severity distributions from item 3 above.


In some cases the analyst cannot use the predictive modeling methods from the Highway Safety Manual (HSM). In these situations, the analyst should consider the following steps in an effort to reduce regression to the mean bias:

  1. Use at least three years, preferably five, of historical crash data. If a significant geometric change was made during that time period, consider adjusting the time period to ensure that all the crashes are related to the changed geometric conditions.

  2. For before/after type comparisons use at least 3-years of observed data before and after installations.

  3. Compare observed crash history severity distributions with known severity distributions for the given site type. If the observed crash history is higher or lower than known averages, the analyst should try to determine whether their results are over/underestimating the expected changes, although the magnitude of error will not be easily determined.

  4. Safety evaluations with short time periods should note that results are preliminary and should be considered with caution.

Other Sources Explaining Regression to the Mean

Srinivasan, R., Gross, F., Bahar, G. (2016). Reliability of Safety Management Methods: Safety Effectiveness Evaluation, https://safety.fhwa.dot.gov/rsdp/downloads/fhwasa16040.pdf; See example on page 16.


Dilipkumar, D. (2020). Regression to the mean and its implications, Towards Data Science.

https://towardsdatascience.com/regression-to-the-mean-and-its-implications-648660c9bf76


Vertasium (2013). How we are Fooled by Statistics, How We’re Fooled By Statistics


Herbel, S., Laing, L., McGovern, C., (2010). Highway Safety Improvement Program (HSIP) Manual, https://safety.fhwa.dot.gov/hsip/resources/fhwasa09029/sec2.cfm

FHWA https://safety.fhwa.dot.gov/tsp/fhwasa15089/appd.cfm