Predicting Hurricane Trajectories Using Geospatial Data
https://github.com/youngdeezy/capstone
Phase I
Introduction
Hurricanes are naturally occurring events that can cause great destruction in their affected regions. Because of this, hurricane forecasting is an important safety measure for any hurricane prone region. Being able to preemptively know when a hurricane will arrive as well as its intensity allows these places to create a proper plan of action on how to respond to the storm.
Although traditional hurricane forecasting is an established method that has been in use for decades, there may be some new possibilities that can be achieved if machine learning can be implemented into hurricane forecasting. Improving hurricane forecasting will only continue to help these areas respond to these storm events.
Objective
•Using geospatial data to analyze hurricane events
•Turn the hurricane data into a data set
•Data set is then used to train a machine learning algorithm to predict the paths of ongoing hurricanes
•Program plots a chart of generated hurricane path and can be used to compare them to previous hurricanes
Goals
•Compare severity and paths of past hurricanes
•Make predictions on hurricane tracks
•Compare predicted results with actual hurricane data
•See how a machine learning model fares at predicting hurricanes compared to traditional forecasting
•Create an easily presentable visualization that will display generated storms as well as previous ones
Literature Review
Before approaching any problem, it is important to see if there is any previous work that can relate to it. Machine Learning is a study that can range across countless of fields and industries. Because of that I was able to find a few articles of interest. The two most important ones are from:
Shiela Alemany - Her group predicted the trajectory of hurricanes using Recurrent Neural Networks. The neural network is employed over a fine grid to reduce truncation errors. This techqniue is used to predict up to 120 hours of hurricane paths.
Albert Kahira - Monthly averages of 6 weather variables(sea surface temperature, mean sea level pressure, sea ice cover, 2 metre pressure, U wind speed and V wind speed) from 1901 to 2010 are provided by the earth science department of Barcelona Supercomputing Center. Uses Convolutional Neural Networks
Exploratory Data Analysis
Data points from Hurricane Wilma's storm track
The dataset that I obtained was a csv file containing 34,415 rows of hurricane data. They contain standard hurricane statistics such as name, time, location, wind speed, and pressure.
All storms that were labelled as unnamed were removed from the dataset.
When the data scrubbing was completed, the DataFrame had 23583 rows of hurricane data. Approximately 10,832 rows of hurricane data were removed from the DataFrame.
Statistics of the dataset
Example of Geospatial dataset used to make continental map of U.S.
Each row represents each state of the U.S.
Visualization of Hurricane Katrina's track.
Windspeed intensity is classified by color (more red)
The visualization that I made would convert all the rows of a certain hurricane into a GeoDataFrame where it can then be plotted onto a map like in the figure above.
Planned Implementation
Using GeoPandas and Matplotlib, the program can track a hurricane onto a map of the US
After the visualization is complete, the dataset is used to train and deploy a model that can predict the tracks of hurricanes
Predicted storm tracks will be plotted against their real counterparts using this program
Phase II
Now that the visulization side of the project is complete, it is now time to approach the machine learning side of it. It is important to find values that will indicate trends of the data to get the optimal model prediction. Some possible training labels include location, wind speed, pressure, and distance between each point.
Clustering
Clustering the entire dataset to see if there are any trends. Unfortunatley, it is hard to get any information from this. It would be better to perform clustering on smaller, more localized datasets.
Distance
One measurement that may be used when training a model is the distance between each point. I modified the dataset so that each point has the longitude and latitude of the previous row. With that I was able to calculate the distance between two points with the haversine formula which requires the previous point's longitude and latitude.
Model
The model that was used to train the data was with a Reccurrent Neural Network. This is implemented with scikit-learn and TensorFlow
Results
Although I was able to train and deploy a model, the results were less than optimal, and I clearly need to improve it before I can make any predictions
Looking to find ways on how to improve the model's performance
Phase III
After my suboptimal results with the RNN I deployed, I investigated possible ways that I can fix the problem and solve the main goals of the project. In this phase I was able to make great strides and deploy a model that predicts the track of hurricanes.
Grid System
After looking back on Shiela Alemany's implementation of grids to train her group's model, I decided to implement a grid system for the purpose of training.
New Dataset
To improve the performance of the model, I decided to use a different hurricane dataset that was obtained from Unisys Weather.
It ranges from 1920-2012, containing approximately 33248 rows of hurricane data.
This dataset mostly contains standard Hurricane data except for the unique-key value. The unique-key value contains the name of the storm, the year, and the hurricane number. This common identifier is used for training purposed. The distance and direction rows are used to implement the grid system. Each point is assigned to a specific location on the grid.
Chart showing all of the datapoints plotted onto a map
Model
The main difference with this RNN model is that grid locations are now being used as the label.
After being deploy the generated grid locations were able to be precise enough to mimic the behavior of their real counterparts. Although there are some inaccuracies, the model was able to generate grid locations of a storm that mimics the behavior of the real counterpart.
Chart comparing the predicted track of hurricane Irene along with its real counterpart
Chart of a data tuple and the grid location that is comparing the predicted grid locations with their real counterpart
Conclusions
Due to the slight inaccuracies in my model's predictions, I believe that traditional forecasting is superior for the time being. This model requires further tuning to make better predictions before it can start being implemented with legitimate hurricane forecasting. However, these are some encouraging first steps to reaching that goal.
Due the time contrainst there is one thing I would have wanted to further work on if I had the time. One thing is that I would have wanted to see if it is possible to revert a generated storm track into a form that it can be plotted onto a map. Being able to make compare these two tracks on an actual map would have made it easier for someone to visually see the performnce of the generated hurricane track.
Works Cited:
Alemany, Sheila & Beltran, Jonathan & Perez, Adrian & Ganzfried, Sam. (2018). Predicting Hurricane Trajectories Using a Recurrent Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 10.1609/aaai.v33i01.3301468.
Asif, Amina & Dawood, Muhammad & Jan, Bismillah & Khurshid, J. & DeMaria, Mark & Minhas, Fayyaz ul Amir Afsar. (2020). PHURIE: hurricane intensity estimation from infrared satellite imagery using machine learning. Neural Computing and Applications. 32. 10.1007/s00521-018-3874-6.
Links cited to develop code:
https://heartbeat.fritz.ai/working-with-geospatial-data-in-machine-learning-ad4097c7228d
https://medium.com/@kap923/hurricane-path-prediction-using-deep-learning-2f9fbb390f18
https://www.datacamp.com/community/tutorials/geospatial-data-python
https://github.com/sheilaalemany/hurricane-rnn
https://www.freecodecamp.org/news/the-ultimate-guide-to-recurrent-neural-networks-in-python/