Forest fires and wildfires are natural disasters that continue to make national and global news, including the recent nationwide fires that killed over a billion animals in Australia. Such fires can be very difficult to predict as many different random events like lightning, electrical failures, smoking, or arson are potential causes. Due to the unfortunate circumstances surrounding these fires, they're often difficult to contain, and can end up costing governments millions of dollars in relief, as well as the tragic loss of the humans, fauna, and flora.
Inspired by this real-world problem, we set out to create a prediction model to evaluate whether United States fires can be contained within their local confines, or may necessarily need external aid. Below are our findings.
Found on https://www.kaggle.com/rtatman/188-million-us-wildfires, the dataset contains information on 1,880,465 US wildfires from 1992-2015, ranging in length from zero to 397 days, (12 data rows were changed to account for typos, cross-referenced online). Information provided includes location of fire (latitude, longitude, US county and state), date and time of discovery, date and time of containment, and overall size of the fire (also separated into seven bins based on size), and other information not used in the assessment totaling 40 columns. The dataset was cleaned and stripped of NA values and reduced to a size of 597,998 rows upon which exploratory data analysis and predictive modeling were employed.
(Used for plotting data on the US map)
(Used for raw data on the fires)
(Used for cost of fire per acre)
Code and other files can be found on Github: https://github.com/mmc20/FireDatathon