Every year, lightning causes millions of dollars in damage due to its devastation of land, housing, and crops. Though computer models have been implemented very successfully in other types of weather prediction, lightning strikes--and as a result their effects--are still particularly difficult to predict. Because of this, I used machine learning to create a model that predicts the danger of lightning strikes based on previous meteorological data. I gathered historical lightning data from a public dataset on a platform called BigQuery, but the dataset did not have weather data so I used an API from meteostat for meteorological data. I retrieved the data for the dates and locations in the historical lightning dataset, and I appended all of the information to a new CSV file. The days with lightning strikes were labeled as 1, and the days without strikes were given a label of 0. The target variable was predicting a 1 or 0. After splitting up my new dataset 75-25 into my training and testing set, I implemented logistic regression to train my model and return an accuracy score. After adjusting for oversampling, I received an accuracy score of 91.22%. The features of the model were latitude, longitude, average temperature, precipitation levels, and wind speed. After creating a website, I also compared the accuracy score to the outputs from linear regression, Random Forest Classifier, and support vector machines. Another future study could include studying the effects of physical features such as mountain ranges on causing cloud-to-ground lightning.
Software demonstration video link: https://youtu.be/lOTkJvvqSEU