This article provides an explanation of what date ranges and durations should be considered when using crash data. For an explanation of how crash data is submitted and processed please see the Crash Data Overview article.
As a general guideline, five years of complete locked crash data should be included in any crash data analysis or reporting effort. Details supporting this and potential exceptions are described below.
Five years of locked data typically captures a representative sample and avoids the risk that a single event or short-term trend will overly impact results. Longer periods are generally not recommended either as they are likely to include geometric, operational, or local changes that affect driver behavior. Regression to the mean bias is still present when using five years of data, but using five years can reduce the impact. A predictive analysis is the preferred method to address regression to the mean bias.
Potential Exceptions
Recent geometric or operational changes
If significant operational or geometric changes have been made in the study area recently a shorter timeframe may be considered. In these situations three years is the minimum period that should be considered.
Low volume roads
On low volume roads more than five years may be necessary in order to have a useful number of crashes. Five years may only include a few events so more data is desired so long as it is not affected by geometric changes.
Even with longer date ranges low volume (low incident) roadways will always be subject to regression to the mean bias.
All crash data goes through a quality control (QC) process that corrects errors and identifies the exact geolocation of the crash. It typically takes 4-6 months for all crashes to be submitted by local agencies and be fully QC’d. Because of this timeline crash data for a given year should not be used until July 1st of the following year. Waiting six months into the following year ensures that all records have been submitted and that all submitted records are correctly geolocated with accurate attributes. You should never use crash data that was submitted in the current year. For example, in March 2023 I should use January 2017 - December 2021 data; in August 2023 I should use January 2018 - December 2022 data.
Potential Exceptions
Fatal crashes
Fatal crashes receive increased scrutiny and are submitted to the database system more quickly. Fatal crash summaries can typically be completed at the beginning of the new year. However site-specific analysis should not use fatal crashes before July 1st because they still may not be fully QC’d and amendments from investigating officers may still be submitted. Information such as driver conditions (DUI) and final injury severity are often not reported for a few months.
Recent crashes
Recency bias is common when analyzing a location that experienced a recent significant or highly publicized crash event. This can lead to unsupported decisions from incomplete or incorrect data. Recent events can be considered for context, especially when they point out obvious deficiencies, but a significant recent event does not warrant an exception to guidelines previously stated in this article.
Significant correlated trends
Operational changes, such as left-turn phasing or construction traffic control, may result in an immediate spike or change in crash trends. When several unusual crashes happen at a location and they can confidently be attributed to a recent change it is not necessary to wait for more data or avoid using recent data before correcting the problem.
Locking crash data means that no new records or amendments to records will be accepted in the database system. This assures that annual state and federal reports can have a final number that will not change over time if records are submitted or updated.
Severity 1-4 (ABCO) crashes are locked for the previous year on July 1st. This is consistent with the recommended six-month wait period. However due to federal requirements fatal crashes (severity 5, K) are locked one year after the calendar year in which the crash occurred, meaning any fatal crash in 2022 will be locked on January 1st, 2024. Even though fatal crashes are unlocked for an additional six months, changes or additions are rare.
UDOT’s crash data analysis system (AASHTOWare Safety powered by Numetric) includes a filter “Crash Verified” = true/false. This indicates if the crash has been reviewed and geolocated in the QC process. It DOES NOT ensure that all crashes have been reported for that period, only that the crash has been reviewed, QC’d, and geolocated. Please note that after a crash is reviewed it may still be amended by the officer until the data has been locked. Any time recent data is used it should only include data that has been verified in the QC process.