Models Implemented

Answering Our Research Questions Using Data Modeling & More

We wound up using a combination of modeling, further graphing, and research to answer our ten questions, and we also wound up answering them out of order, as we found that some of them needed to be addressed first to provide further background information for answering later questions. There was minimal data remodeling required for the models, but some reconfiguration of our code was required.

What have been the past patterns of El Niño and La Niña intensity?

We can see, based on the current output, the following trends:

High spikes in El Niño activity are often followed by smaller La Niña trends unless multiple high-intensity El Niño periods occur, in which case they seem to be followed by similarly intense La Niña periods.
La Niña seems to be strongest around 1977, with a similar spike around 2005.
El Niña periods seem to be growing weaker as time passes from the present until now, with fewer high-intensity temperature changes occurring
later in the graph compared to earlier periods.
Conversely, El Niño periods seem to be growing more intense overall, with the highest spikes occurring around 2007. These intense El Niño periods are also more frequent later on, with fewer small spikes seen as the graph goes on.

Have recent environmental efforts such as adaptation, geo-engineering, or mitigation strategies produced any significant changes regarding the climate?

This question, we found, is best answered by summarized research from outside sources, as the scope of the question was found to be largely beyond our capabilities to predict accurately.

Pacific ENSO Applications Center (PEAC)’s strategy is to foster collaborative efforts to overcome these adversities, prepare for the necessary mitigation and adaptation strategies, and to improve public awareness of oceanic regions (Schroeder et al. 2012). Island climate change preparedness ranges from coastal protection structures such as sea walls, elevated homes, improving agroforestry, reef restoration, planting native plants for beach erosion reduction, better fishing technologies and equipment, and diversifying (salt tolerant) crops (Mycoo et al. 2022).

Mitigation Strategies:

"Together, businesses, governments, and communities can also: Lessen their dependence on fossil fuels, Embrace sustainable energy solutions such as wind and solar, Protect rainforests through sustainable practices, Move to electric vehicles, Develop a smart grid, Be more conscious of how they use energy, Work with local non-governmental organizations, Protect those who are directly impacted by inequitable climate practices, and Don't ignore the science of facts." (Net Impact 2021).
"Renewable energy is energy that comes from sources that are naturally replenished. Look at eia picture , Geothermal, hydropower, solar, and wind energy installations produce zero or almost no carbon emissions once they are up and running, Hydropower works by converting the kinetic energy of falling water into mechanical energy to operate a turbine, which then generates electricity, Wind power works by using the force of wind to rotate turbine blades, and the turbine converts rotational mechanical energy to electrical energy but can impact bird and bat populations, geothermal energy provides both direct heat and generates electricity by using Earth’s internal heat as an energy source then these heat pumps, often called ground-source heat pumps, effectively preheat air in winter or precool cool it in summer, reducing heating and air conditioning costs in homes and other buildings, passive solar building design coupled with good insulation and control of airflow can eliminate or practically eliminate the need for heating systems, We can save energy at the individual level by driving less: carpooling, combining errands into fewer trips, and walking, biking, or taking public transportation instead of cars, companies using the use of onsite combined heat and power (CHP) system, a facility produces its own electricity instead of buying it from the grid, and it uses the heat generated instead of wasting it, carbon capture and storage (CCS) removes the CO₂ from the final waste products which could be stored away or reused for other purposes, afforestation, reforestation, reducing deforestation, and planting forests that grow rapidly, regenerative farming returns carbon to the soil, another regenerative agriculture is holistic planned grazing, in which farmers plan their livestock grazing so that plants have time to recover after being grazed on, reduce greenhouse gas emissions from waste is landfill gas recovery, which focuses on capturing and using the methane generated from bacterial anaerobic decomposition of landfill waste, reduce food waste, recycling conserves natural resources, reduces waste going into landfills and incinerators, and prevents degradation of land that often accompanies obtaining new raw materials such as metals, to mitigate climate change involves dietary choices in food consumption" (Paleontological Research Institution 2022).

Adaptation Strategies:

"Roads and bridges may need to be built or adapted to withstand higher temperatures and more powerful storms, some cities on coastlines may have to establish systems to prevent flooding in streets and underground transport, Mountainous regions may have to find ways to limit landslides and overflow from melting glaciers." (United Nations).
"Green infrastructure refers to structures that use plants, soil, and other natural features to perform functions such as providing shade, absorbing heat, blocking wind, or absorbing stormwater" (Paleontological Research Institution 2022)

Further comments are made by a paper authored for the academic journal Environmental Chemistry Letters summarized after examining over 10 different conventional and alternative methods of climate change mitigation concluded that current efforts were insufficient to meet the goals set by The Paris Agreement in 2015, and that further development of alternative methods was critical to future climate mitigation efforts (Fawzy et. al. 2020).

In short, based on both the found literature, and the graph above, showing renewable energy to be at a mere 12% in the U.S. for 2020, recent environmental efforts have failed to produce a significant change or mitigation in climate change, in our chosen region of Polynesia/Oceania, or around the world. If countries with the wealth and innovation to create new energy sources and climate efforts are failing, then smaller island nations that sometimes lack critical infrastructure to even protect themselves from current climate risks are highly unlikely to be able to succeed.

How have El Niño and La Niña years impacted the Oceanic regions?

To answer this question, a graph was made for each of the 3 main regions of study: Polynesia, Micronesia, and Melanesia.

The graph for Polynesia shows the outputs for changes in the El Niño and La Niña outputs, as well as the overall ENSO cycle change. We can see that:

For the El Niño season, there's a noticeable spike in the westernmost part of the region, but overall the temperature for the whole region tends towards severe, as seen through the high color gradient of the region.
Comparatively, the La Niña changes are smaller, with spikes in similar regions, but overall weaker temperatures than El Niño.
For the overall ENSO cycle, there's a decrease seen in the northern part of Polynesia, but mostly an increase of 0.5 to 1.5 degrees Celcius based on past data.

For Micronesia, we can see that:

There's a fairly even increase for El Niño years, and like Polynesia, El Niño years tend towards a high temperature..
La Niña years also exhibit fairly similar temperatures as Polynesia, with cooler areas seen towards the north.
There's a significant temperature decrease overall for the ENSO cycle, although the southernmost part does exhibit an overall increase.

For Melanesia, we can see:

Similar high temperatures for El Niño as seen in other regions, with higher temperatures in the north.
La Niña years seem to have a cooler temperature as well, with some of the coolest regions being in the south, matching the El Niño pattern.
Overall, the ENSO cycle exhibits a temperature increase, although the large land region in the area, New Guinea, seems to be cooler, which is perhaps due to land formation, as larger land masses typically exhibit higher temperatures overall than ocean areas or coastlines.

Overall, the data showcases:

High temperatures given past data, especially during El Niño years, with a trend towards temperature increases of 0.5 to 1.5 degrees Celcius.

Depending on the degree to which climate change was to be altered in the future, would ENSO severely affect the oceanic regions? If so, to what extent? And based on predictive data, which Oceanic regions would be the most and least impacted?

Similarly to the last question, there is a set of graphs for each of the 3 main regions, beginning with Polynesia:

The graphs for Polynesia showcase:

A predicted high temperature for the El Niño years, with the central region exhibiting the highest temperatures, around 32 degrees Celcius overall.
La Niña years seem to predict slightly cooler cycles, around 30-31 degrees Celcius overall.
The overall cycle predicts cooler areas in the North, with increases below, mimicking what the past data has aggregated up to today.

The data for Micronesia shows:

Similar temperatures for the El Niño season, but with higher temperatures creeping up from the south to cover the northern areas as well.
La Niña years seem to exhibit a similar trend of increasing temperatures from the south to the north.
The ENSO cycle is similar to the past data but with a smaller decrease in temperature for some of the western edges of the graph.

For the future Melanesia data:

There seems to be the same pattern as the previous regions, where the temperature is roughly the same, but with somewhat higher temperatures predicted in larger regions of the area, in this case spreading from the north to the south.
The ENSO prediction is similar to the past data, but with a larger temperature increase, and a smaller region of 'decreased' temperature in New Guinea.

Overall, this tells us that the increase in temperature demonstrated by past data is predicted to continue into the future, with potentially larger temperature increases, hotter regions overall, and less variation between the El Niño and La Niña seasons as the effects of climate change will continue to increase.

Based on all of this data, we can estimate that of the three given regions, Melanesia is likely to be the most impacted due to the higher warming increase, as well as the decrease in cooling in larger land masses such as New Guinea, indicating that smaller land masses around the area are likely to be worse off as well.

Linear Regression Model

For the Linear Regression and Logistic Regression Models, the data had to be normalized using scaler.fit_transform from sklearn.preprocessing's StandardScaler package, similar to our past assignments and classwork. Below is a snapshot of the non-normalized data:

And below is the normalized data:

Finally, we have a printed snippet of the data after division into a training set and a testing set, as well as indexing each of the predictors: weather features, precipitation, and outcome, as the outcome was not scaled (since it was the outcome feature being trained for):

One of the model types we utilized to examine our data was a linear regression model. For computations performed to clean and normalize the data, please see the Visualizations/Cleaning tab.

For evaluating the use/effectiveness of each model, the parameter of 'accuracy' was selected. Accuracy in this case is the percentage of correct classifications a trained model achieves. Each model is also programmed to evaluate itself for the best hyperparameter, or external configuration variable, as a form of extra information. The following is the output for the linear regression model:

Polynesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.1240711426293525

Best Hyperparameters: {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}

===========

Micronesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.28731666413606766

Best Hyperparameters: {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}

===========

Melanesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.2070017014339661

Best Hyperparameters: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}

===========

We can see, based on this output, which lists accuracy as Development Set Mean Absolute Error, that the linear models aren't very accurate to the data at all. This was somewhat anticipated, given that linear models are poor at predicting non-linear data, which temperature and atmospheric data almost always are. However, we decided to try a linear regression model anyway to test ourselves and the data, because if a linear pattern had emerged it might have hinted at errors in the data, or at an unusual linear relationship we could extrapolate for our conclusion. The most accurate model is Micronesia's at 29%, which tells us this outcome is highly unlikely.

Logistic Regression Model

For the logistic regression model, the base dataset was split into a new training and testing pair, and indexed based on weather features and the outcome. Precipitation in particular was binarized based on the mean of the training set, and a new column was created for the data based on the classification outcome of the model. The following is the output of the training data, and the x and y sets for the training data:

The Logistic Regression model was our second model type, made using similar cleaning methods for the Linear Regression model.

The outcomes for each region, based on the logarithmic regression model, are:

Resutlts

Polynesia:

Best Hyperparameters: {'C': 10, 'penalty': 'l2'}

Accuracy on Test Data: 0.5138339920948617

===========

Micronesia:

Best Hyperparameters: {'C': 0.1, 'penalty': 'l2'}

Accuracy on Test Data: 0.549407114624506

===========

Melanesia:

Best Hyperparameters: {'C': 1, 'penalty': 'l2'}

Accuracy on Test Data: 0.4743083003952569

===========

Going off of the accuracy, we can see a significantly higher correct prediction rate compared to the linear regression model, with the highest accuracy being for Micronesia at 55%. Curiously, all of them exhibit different ideal hyperparameters, those being 10, 0.1, and 1 respectively. It may have something to do with the way the data for each region responded to the model.

What are the primary meteorological variables to severely impact the oceanic regions based on magnitude?

The above graphs answer this question by fitting the recursive feature elimination of the linear regression model with the normalized data, and graphing the resulting importance calculated for each feature. We can see a similar trend for each region, in that WSPDSRFAV, or wind surface pressure, is almost statistically insignificant compared to the importance of TS, or temperature, as a predictor. This does make a lot of sense, as temperature affects both land masses and the ocean, as well as being a primary factor in storm formation, and many other socioeconomic factors such as farming or fishing seasons. Below are the exact values for each feature's importance:

Polynesia

[0.05100927 0.94899073]

===========

Micronesia

[0.10951039 0.89048961]

===========

Melanesia

[0.06755424 0.93244576]

===========

How does the historical reanalysis data of the Oceanic Regions and their associated islands compare to its predictive data?

To answer this question, we had to create our final two models: a Forward Neural Network (FNN), and a Convolution Neural Network. An FNN is a type of broad neural network with only one direction of information flow--forward. A CNN is a type of optimized FNN that is characterized by its use of filters, or kernel optimization. The creation and code snippets can also be seen on the visualizations tab. First, a training set, testing set, and validation set were created using the collected data we had, which were then split into features and outcomes. Then, the data was normalized. Below is a view of the Neural Network data:

Forward Neural Network

The FNN is created using stacks of 'dense' and 'dropout' layers, which are fed data images that are flattened into vectors. Below is a snippet of the image 'vector', though it isn't very exciting:

And below is a printout of the one model's outputs from the function that created it, listing its layers and parameters. There are several models, this is just one to give an illustration of the output:

And then, we have the outputs for each model, for each region we examined with the data, giving the hyperparameters, and the accuracy for the training and validation sets:

Polynesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 732.9714, Training Accuracy: 27.0733

Validation Loss: 765.2873, Validation Accuracy: 27.6637

Number of Parameters 1: 63413

Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 713.6483, Training Accuracy: 26.7141

Validation Loss: 744.9549, Validation Accuracy: 27.2937

Number of Parameters 2: 819457

Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 810.1848, Training Accuracy: 28.4636

Validation Loss: 846.2267, Validation Accuracy: 29.0898

Number of Parameters 3: 415425

Micronesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 846.7216, Training Accuracy: 29.0984

Validation Loss: 882.0287, Validation Accuracy: 29.6989

Number of Parameters 1: 63413

Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 844.9132, Training Accuracy: 29.0673

Validation Loss: 880.2386, Validation Accuracy: 29.6688

Number of Parameters 2: 819457

Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 919.4033, Training Accuracy: 30.3216

Validation Loss: 957.8805, Validation Accuracy: 30.9496

Number of Parameters 3: 415425

Melanesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 824.6379, Training Accuracy: 28.7165

Validation Loss: 861.5817, Validation Accuracy: 29.3527

Number of Parameters 1: 63413

Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 859.6379, Training Accuracy: 29.3195

Validation Loss: 898.3053, Validation Accuracy: 29.9717

Number of Parameters 2: 819457

Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 825.5211, Training Accuracy: 28.7318

Validation Loss: 862.7159, Validation Accuracy: 29.3720

Number of Parameters 3: 415425

We can see based on the above data outputs that the accuracy actually tends to be higher (though only slightly) on the validation sets versus the training sets. Overall, there's a similar accuracy rate to the linear regression model, which is surprising, but could be due to the data input format, or due to the nature of the data's values as atmospheric data, a good portion of which is also predictive in nature.

Convolutional Neural Network

This code cleaning can also be seen on the Visualization/Cleaning tab. The CNN is trained very similarly to an FNN, but with more emphasis on robust dropout layers. Additionally, the data isn't flattened, but is instead reshaped to approximate the original raw 3D form that we imported it as. A snippet of the resulting vectors, interpreted by colab, is below:

Below there's also an output for the CNN, to demonstrate it in contrast to the FNN:

And here are the accuracy ratings and parameter numbers for each CNN:

Polynesia

Training Loss: 0.0334, Training Accuracy: 0.0561

Validation Loss: 0.0336, Validation Accuracy: 0.0499

Number of Parameters 1: 1512705

Training Loss: 0.0335, Training Accuracy: 0.0531

Validation Loss: 0.0335, Validation Accuracy: 0.0536

Number of Parameters 2: 293497

Training Loss: 0.0337, Training Accuracy: 0.0848

Validation Loss: 0.0334, Validation Accuracy: 0.0762

Number of Parameters 3: 556257

Micronesia

Training Loss: 0.0587, Training Accuracy: 0.1628

Validation Loss: 0.0601, Validation Accuracy: 0.1699

Number of Parameters 1: 1512705

Training Loss: 0.0323, Training Accuracy: 0.0704

Validation Loss: 0.0313, Validation Accuracy: 0.0712

Number of Parameters 2: 293497

Training Loss: 0.0326, Training Accuracy: 0.0826

Validation Loss: 0.0317, Validation Accuracy: 0.0827

Number of Parameters 3: 556257

Melanesia

Training Loss: 0.0511, Training Accuracy: 0.1335

Validation Loss: 0.0538, Validation Accuracy: 0.1469

Number of Parameters 1: 1512705

Training Loss: 0.0334, Training Accuracy: 0.0562

Validation Loss: 0.0323, Validation Accuracy: 0.0562

Number of Parameters 2: 293497

Training Loss: 0.0351, Training Accuracy: 0.0426

Validation Loss: 0.0341, Validation Accuracy: 0.0438

Number of Parameters 3: 556257

One thing to note that isn't shown here is that the CNN models can take on further hyperparameters, potentially increasing the output's accuracy. The hyperparameters aren't shown here due to the constraints of the website's text boxes, as they're quite long strings of parameters.

In terms of answering the question, this shows us that reanalysis of the historical data, despite the earlier similarities seen in the regional graphs, doesn't seem to be very accurate for predicting future analysis. If they were to be significant at all, we would expect around a 0.7 to 0.8 value for accuracy, or at least a 0.5, but most models barely seem to scratch the 0.2 value. It's potentially due to the reshaping of the data, or the limitations of FNN and CNN as 'general' neural networks.

To what extent does climate change influence the magnitude of El Niño and La Niña in a warmer climate?

This question was answered by indexing the El Niño, La Niña, and ENSO cycle data, and indexing them based on both future and past climatology, in order to create an accurate picture of how these cycles would affect a warmer climate:

We can see that based on future data, we can expect spikes in La Niña cycles for the next two decades, after which the El Niño cycle would take over as more 'dominant' in influencing the climate, likely as the world continues to warm due to climate change, influencing the ENSO cycle towards a stronger El Niño, since that's the part of the cycle that primarily controls warmer years. Meanwhile, if we base these predictions on past data, we see a much more dramatic swing towards El Niño dominance, likely due to the lower temperature values exhibited from past decades, which would make the future data much more drastic in terms of changes. From this, we can predict a future that continues towards a warmer, more unstable climate.

Based on predictive data, what are some socio-economic aspects of the Oceanic regions that could be worsened by climate change?

This question can't be answered by creating more graphs or models, but by examining our existing data, graphs, and models for signs that point towards what factors might be worsened due to climate change. For starters, climate change interrupts the established pattern of seasons that many regions rely on for growing food or even the growing periods for wild vegetation. Changes in those periods can then influence wildlife, often by depriving them of food. After all, regions that have too many hot days, too many cold days, or days without rain, or too many days with rain, can prevent crops and vegetation from growing that are critical to the region. The climate can be explained by the phrase, 'everything in moderation'. Too much of any one variable, be it wind, temperature, or rainfall, can be devastating. Increased temperature also melts sea ice, increasing sea levels, which are also horrible to oceanic regions, which are characterized by low elevations and large amounts of coastline, which are easily swallowed by encroaching oceans, depriving residents of real estate to live and work in. An erratic climate can also lead to increased storms, typhoons, and other tropical storms, causing damage to necessary infrastructure for the area. A report released by the Center for Global Development reviewed potential socioeconomic issues that could result from climate change in the coming decades and found evidence of threats to economies, agricultural industries, hydrological changes to environments, health issues, and energy use, stating that "There will likely be winners and losers, most probably in the medium term, but in the long term all economies (both developing and developed) might be on the losing end." (Adom, 2024).

To sum it all up, this is a critical issue that we're facing all over the world, and hopefully, the models and graphs we've created help to illuminate that issue, and emphasize the need for future research, more detailed modeling, and applying knowledge to mitigate climate change for the future.

Here's the link to the github for this project, you can also click the button below: https://github.com/LeoNgamkam/Data-Mining-Weather-Patterns