The goal of this data science capstone project has been to acquire national weather data to learn which areas of the U.S. struggle with weather prediction and the possible reasons why. Specifically, we focused on the error in high and low temperature forecasting.
A major component of this project has been the collection and tidying of the data. This is because to our knowledge, there is no large national dataset cleanly containing weather observations alongside predicted values at various time intervals. The process has culminated in two datasets that are continuing to expand as new data is acquired. The first dataset contains the various observed and forecasted weather values while the second dataset contains city-level variables such as elevation and wind speed that will be later used to explain the errors in temperature prediction. Because of the lack of such a dataset in the academic space, the data we have produced and managed is a valuable contribution for further study to be conducted.
With this cleaned data, we have started analysis into the prediction errors and are learning what factors contribute to them. We have done this using various statistical tests as well as geospatial techniques like map interpolation and kriging to explore which factors are of most impact in producing error in temperature forecast. The end goal of the project is the creation of a model that predicts when a temperature forecast is likely to have a higher margin of error.
Clayton Strauch is a graduating senior from Altamont, Illinois majoring in mathematics and data science with minors in computer science and international studies. He is currently interning with Pacific Northwest National Laboratory and is looking forward to starting his career as a data scientist after graduation. His hobbies include reading, coding, cooking, and travelling.
Lauren Schmiedeler is from Saint Louis, MO and is majoring in Data Science and Mathematics. After graduation, she will spend the summer as a Retirement Actuarial Intern at Willis Towers Watson, and next fall she will begin Maryville University's Actuarial Science Masters Program.
Harrison Lanier is a Data Science major from Dallas, TX. After graduation, he will pursue an MS in Artificial Intelligence at Saint Louis University. Harrison’s hobbies include spending time with his cats and playing basketball.
Sai Shreyas Bhavanasi is an international student from India pursuing a double major in Computer Science and Data Science. After graduating, he seeks to work as a Data Scientist. His interests include coding, baking, and traveling.
The group would like to thank their faculty sponsor Darrin Speegle for their support of this project.