Student Projects
Spring 2023
The following are projects completed by students in EEPS-DATA 1720: Tackling Climate Change with Machine Learning at Brown University in Spring 2023.
The projects below are shared with permission of the student authors and should not be redistributed.
Predicting Installed Solar Panels Through Socio-Economic Data by Iris Huang & John Finberg
As climate change worsens, solar panels are an accessible and affordable solution for mitigation. To increase adoption of solar panels and solar energy use, it is important to not only understand where solar panel installations already exist, but also the socioeconomic factors which may influence the installation of solar panels in order to help motivate and overcome potential barriers to installation. In this project, we aimed to predict the number of existing solar panel installations in census tracts in the United States using socioeconomic data. Using data from Google’s Project Sunroof as well as the 2020 American Community Survey, we developed and compared the effectiveness of models created using multiple linear regression, a neural network, and a random forest. Our random forest model was the most successful of the set of models, with a R-squared coefficient of 0.81. Based on the results of our random forest model, we show that it is possible to predict the number of solar panel installations using socioeconomic data.
Machine Learning for Heatwaves: Predicting Extreme Heat in California and Nevada by Jed Bell & Jack Holmes
In this project, we experiment with the use of machine learning models to predict heatwaves. We focus our efforts on a geographic region that comprises most of California and Nevada, and we train our models on climate data from this region. We define extreme heat as daily temperatures in the top five percent for a given region, and use both random forests and neural networks to predict for each of the next ten days whether each region will experience extreme heat. For one day ahead predictions, the random forest models performed well in all three metrics—accuracy, precision, and recall. However for ten day ahead predictions, performance decreased for all metrics, especially recall which we consider the most important metric for this research question. With ten day forecasts, our neural networks had better recall than the baseline random forest after using a novel curriculum learning protocol. Yet, both neural networks still lagged behind in overall accuracy. Overall, we believe this approach can work, but we need to consider more input variables in our models.
Machine Learning Classification of Online Climate Misinformation by Livia Gimenes, Will Kattrup & Cali Rivera
The objective of this paper is to explore the capabilities of a supervised machine learning model that classifies climate change misinformation. We focused on training and fine-tuning the model originally developed by Coan et al, using the data from that same paper, and later applied the model to data that was scrapped from different political subreddits. Out of the models that were experimented with, RoBERTa was found to be the best performer. While deploying the model to the reddit datasets, two fine-tuned versions of RoBERTa were used. The models were first tested on a climate skeptics subreddit and a climate scientist subreddit. While the models identified more misinformation in the former subreddit than the latter, they were found to be over-predicting contrarian claims.That trend continued for the political dataset. Because of these results and the limited size of our data set, we believe that it would be ill-advised to attempt to make any claims regarding trends in climate misinformation based on these results. For further work, we would like to explore additional methods of addressing class imbalanced data as well as use a larger data set for further exploration of trends in climate misinformation on Reddit. We would also like to explore further data pre-processing and transfer learning-based approaches to better adapt the model to novel data.
Clustering Senate Environmental Votes by Kai Zuang, Sanyu Rajakumar & Michelle Liu
In recent years, addressing climate change and environmental issues has become a critical challenge requiring effective legislative action. The United States Senate plays an important role in addressing this issue. This paper focuses on analyzing the voting patterns of senators in the United States Congress for environmental bills to better understand how the Senate shapes environmental policies. By employing unsupervised clustering methods, including spectral clustering with k-means, k-means, and hierarchical clustering, this paper seeks to identify clusters of senators with similar voting behaviors on environmental issues.
Understanding the dynamics within the Senate is crucial for environmental groups and activism. While there is a recent general trend of voting along party lines, the unique com- position of the Senate, with two representatives from each state, allows for the formation of coalitions that can transcend party lines at times. We explore these conditions by examining the extent to which senators vote along party lines and identifying instances where senators “cross the line” in their voting behavior.
Our analysis utilizes a vote information dataset from the League of Conservation Voters (LCV) – specifically, the 2021 scorecard vote, which includes envrionmental bills and the corresponding votes of the senators from that year. In addition to clustering the Senate as a whole, we also cluster senators within the Democratic and Republican parties separately, which allows exploring coalitions within parties while mitigating the influence of party lines on voting.
Our findings will contribute to a deeper understanding of the voting patterns and behav- iors of senators regarding environmental issues. Ultimately, our research aims to be the basic foundation for deeper analysis into targeted advocation for climate-related policies within the Senate.
Natural Disasters and Human Migration in the US by Zain Asghar, Panos Syrgkanis, David Lauerman & Mason Lee
Despite the profound alterations that climate change is inflicting upon our planet, the Intergovernmental Panel on Climate Change (IPCC) posits that its most consequential impact will be felt in the domain of human migration. The escalation of natural disasters has already spurred internal migration within the United States, a tendency that is projected to persist and intensify. The catalysts for such migrations can be both direct, stemming from a specific natural disaster, and indirect, driven by the looming threat of future disasters. Unfortunately, our grasp of post-disaster migration patterns remains tenuous, hampering our capacity to respond effectively. This deficiency is exemplified by the protracted states of emergency and sluggish policy implementations that have characterized numerous disaster responses, such as in New Orleans following Hurricane Katrina. By harnessing comprehensive governmental datasets pertaining to annual population movements between counties and county-level disaster declarations, we aim to elucidate the correlation between these phenomena and forecast future population shifts. This will enable states and counties to better anticipate and accommodate population influxes and outfluxes.
We model the data with a graph structure, wherein counties are represented as nodes and population movements as edges. Our initial approach entails conducting network analyses on this expansive graph to discern baseline patterns and overarching trends. To generate the requisite population movement predictions, we must learn the inter-county relationships as well as the annual trends and their interactions with climate-related disasters. To accommodate these multifaceted factors, we employ three variations of Graph Neural Networks (GNNs): spatiotemporal message passing neural network (LSTM-MPNN), non-temporal MPNN, and temporal diffusion convolution, which facilitates the learning of patterns within the data and the formulation of predictions for the current or upcoming year. Through these methods we were able to establish qualitative migration differences and establish a pipeline which, with further work, could yield our desired results.
Proposal: A Time Series Analysis Approach to Predicting Coastal Erosion Along the Beaufort Sea by Joseph Pate
Coastal erosion is an imminent and critical issue facing coastlines globally. However, due to various climatic factors and geographic location, some areas are more prone to coastal erosion than others, known as erosional hotspots. One such hotspot is Alaska’s Arctic Coast. In this research proposal I investigate and perform analysis on the remote sensing and atmospheric/oceanic data collected from this region. I then propose that a time-series analysis approach to forecasting shoreline change be performed using a simple Feed Forward Network and Error Trend Seasonality. I explore what this might look like a give recommendations on how to best approach this problem.