Models Implemented

Answering Our Research Questions Using Data Modeling & More

We wound up using a combination of modeling, further graphing, and research to answer our ten questions, and we also wound up answering them out of order, as we found that some of them needed to be addressed first to provide further background information for answering later questions. There was minimal data remodeling required for the models, but some reconfiguration of our code was required. 

What have been the past patterns of El Niño and La Niña intensity? 

We can see, based on the current output, the following trends:

Have recent environmental efforts such as adaptation, geo-engineering, or mitigation strategies produced any significant changes regarding the climate? 

This question, we found, is best answered by summarized research from outside sources, as the scope of the question was found to be largely beyond our capabilities to predict accurately. 


Pacific ENSO Applications Center (PEAC)’s strategy is to foster collaborative efforts to overcome these adversities, prepare for the necessary mitigation and adaptation strategies, and to improve public awareness of oceanic regions (Schroeder et al. 2012). Island climate change preparedness ranges from coastal protection structures such as sea walls, elevated homes, improving agroforestry, reef restoration, planting native plants for beach erosion reduction, better fishing technologies and equipment, and diversifying (salt tolerant) crops (Mycoo et al. 2022). 

Mitigation Strategies:

Adaptation Strategies:


Further comments are made by a paper authored for the academic journal Environmental Chemistry Letters summarized after examining over 10 different conventional and alternative methods of climate change mitigation concluded that current efforts were insufficient to meet the goals set by The Paris Agreement in 2015, and that further development of alternative methods was critical to future climate mitigation efforts (Fawzy et. al. 2020).

In short, based on both the found literature, and the graph above, showing renewable energy to be at a mere 12% in the U.S. for 2020, recent environmental efforts have failed to produce a significant change or mitigation in climate change, in our chosen region of Polynesia/Oceania, or around the world. If countries with the wealth and innovation to create new energy sources and climate efforts are failing, then smaller island nations that sometimes lack critical infrastructure to even protect themselves from current climate risks are highly unlikely to be able to succeed.

How have El Niño and La Niña years impacted the Oceanic regions? 

To answer this question, a graph was made for each of the 3 main regions of study: Polynesia, Micronesia, and Melanesia.

The graph for Polynesia shows the outputs for changes in the El Niño and La Niña outputs, as well as the overall ENSO cycle change. We can see that:

For Micronesia, we can see that:

For Melanesia, we can see:

Overall, the data showcases:

High temperatures given past data, especially during El Niño years, with a trend towards temperature increases of 0.5 to 1.5 degrees Celcius.

Depending on the degree to which climate change was to be altered in the future, would ENSO severely affect the oceanic regions? If so, to what extent? And based on predictive data, which Oceanic regions would be the most and least impacted? 

Similarly to the last question, there is a set of graphs for each of the 3 main regions, beginning with Polynesia:

The graphs for Polynesia showcase:

The data for Micronesia shows:

For the future Melanesia data:

Overall, this tells us that the increase in temperature demonstrated by past data is predicted to continue into the future, with potentially larger temperature increases, hotter regions overall, and less variation between the El Niño and La Niña seasons as the effects of climate change will continue to increase.


Based on all of this data, we can estimate that of the three given regions, Melanesia is likely to be the most impacted due to the higher warming increase, as well as the decrease in cooling in larger land masses such as New Guinea, indicating that smaller land masses around the area are likely to be worse off as well.

Linear Regression Model

For the Linear Regression and Logistic Regression Models, the data had to be normalized using scaler.fit_transform from sklearn.preprocessing's StandardScaler package, similar to our past assignments and classwork. Below is a snapshot of the non-normalized data:

And below is the normalized data:

Finally, we have a printed snippet of the data after division into a training set and a testing set, as well as indexing each of the predictors: weather features, precipitation, and outcome, as the outcome was not scaled (since it was the outcome feature being trained for):

One of the model types we utilized to examine our data was a linear regression model. For computations performed to clean and normalize the data, please see the Visualizations/Cleaning tab.

For evaluating the use/effectiveness of each model, the parameter of 'accuracy' was selected. Accuracy in this case is the percentage of correct classifications a trained model achieves. Each model is also programmed to evaluate itself for the best hyperparameter, or external configuration variable, as a form of extra information. The following is the output for the linear regression model:

Polynesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.1240711426293525

Best Hyperparameters: {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}

===========

Micronesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.28731666413606766

Best Hyperparameters: {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}

===========

Melanesia

Development Set Mean Absolute Error (MAE) (Best Model): 0.2070017014339661

Best Hyperparameters: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}

===========

We can see, based on this output, which lists accuracy as Development Set Mean Absolute Error, that the linear models aren't very accurate to the data at all. This was somewhat anticipated, given that linear models are poor at predicting non-linear data, which temperature and atmospheric data almost always are. However, we decided to try a linear regression model anyway to test ourselves and the data, because if a linear pattern had emerged it might have hinted at errors in the data, or at an unusual linear relationship we could extrapolate for our conclusion. The most accurate model is Micronesia's at 29%, which tells us this outcome is highly unlikely.

Logistic Regression Model

For the logistic regression model, the base dataset was split into a new training and testing pair, and indexed based on weather features and the outcome. Precipitation in particular was binarized based on the mean of the training set, and a new column was created for the data based on the classification outcome of the model. The following is the output of the training data, and the x and y sets for the training data:

The Logistic Regression model was our second model type, made using similar cleaning methods for the Linear Regression model.


The outcomes for each region, based on the logarithmic regression model, are:

Resutlts

Polynesia:

Best Hyperparameters: {'C': 10, 'penalty': 'l2'}

Accuracy on Test Data: 0.5138339920948617

=========== 

Micronesia:

Best Hyperparameters: {'C': 0.1, 'penalty': 'l2'}

Accuracy on Test Data: 0.549407114624506

===========

Melanesia:

Best Hyperparameters: {'C': 1, 'penalty': 'l2'}

Accuracy on Test Data: 0.4743083003952569

===========

Going off of the accuracy, we can see a significantly higher correct prediction rate compared to the linear regression model, with the highest accuracy being for Micronesia at 55%. Curiously, all of them exhibit different ideal hyperparameters, those being 10, 0.1, and 1 respectively. It may have something to do with the way the data for each region responded to the model.

What are the primary meteorological variables to severely impact the oceanic regions based on magnitude? 

The above graphs answer this question by fitting the recursive feature elimination of the linear regression model with the normalized data, and graphing the resulting importance calculated for each feature. We can see a similar trend for each region, in that WSPDSRFAV, or wind surface pressure, is almost statistically insignificant compared to the importance of TS, or temperature, as a predictor. This does make a lot of sense, as temperature affects both land masses and the ocean, as well as being a primary factor in storm formation, and many other socioeconomic factors such as farming or fishing seasons. Below are the exact values for each feature's importance:

Polynesia

[0.05100927 0.94899073]

===========

Micronesia

[0.10951039 0.89048961]

===========

Melanesia

[0.06755424 0.93244576]

===========

How does the historical reanalysis data of the Oceanic Regions and their associated islands compare to its predictive data? 

To answer this question, we had to create our final two models: a Forward Neural Network (FNN), and a Convolution Neural Network. An FNN is a type of broad neural network with only one direction of information flow--forward. A CNN is a type of optimized FNN that is characterized by its use of filters, or kernel optimization. The creation and code snippets can also be seen on the visualizations tab. First, a training set, testing set, and validation set were created using the collected data we had, which were then split into features and outcomes. Then, the data was normalized. Below is a view of the Neural Network data:

Forward Neural Network

The FNN is created using stacks of 'dense' and 'dropout' layers, which are fed data images that are flattened into vectors. Below is a snippet of the image 'vector', though it isn't very exciting:

And below is a printout of the one model's outputs from the function that created it, listing its layers and parameters. There are several models, this is just one to give an illustration of the output:

And then, we have the outputs for each model, for each region we examined with the data, giving the hyperparameters, and the accuracy for the training and validation sets:

Polynesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 732.9714, Training Accuracy: 27.0733

Validation Loss: 765.2873, Validation Accuracy: 27.6637

Number of Parameters 1: 63413


Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 713.6483, Training Accuracy: 26.7141

Validation Loss: 744.9549, Validation Accuracy: 27.2937

Number of Parameters 2: 819457


Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 810.1848, Training Accuracy: 28.4636

Validation Loss: 846.2267, Validation Accuracy: 29.0898

Number of Parameters 3: 415425

Micronesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 846.7216, Training Accuracy: 29.0984

Validation Loss: 882.0287, Validation Accuracy: 29.6989

Number of Parameters 1: 63413


Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 844.9132, Training Accuracy: 29.0673

Validation Loss: 880.2386, Validation Accuracy: 29.6688

Number of Parameters 2: 819457


Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 919.4033, Training Accuracy: 30.3216

Validation Loss: 957.8805, Validation Accuracy: 30.9496

Number of Parameters 3: 415425

Melanesia

Hyperparameters 1: {'num_layers': 3, 'num_nodes': [10, 8, 8], 'activat_fun': 'linear', 'dropout': [0.1, 0.1, 0.2], 'weight_reg': <keras.src.regularizers.L1 object at 0x7ba99f01bca0>}

Training Loss: 824.6379, Training Accuracy: 28.7165

Validation Loss: 861.5817, Validation Accuracy: 29.3527

Number of Parameters 1: 63413


Hyperparameters 2: {'num_layers': 3, 'num_nodes': [128, 64, 32], 'activat_fun': 'linear', 'dropout': [0.15, 0.1, 0.05], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f0181c0>}

Training Loss: 859.6379, Training Accuracy: 29.3195

Validation Loss: 898.3053, Validation Accuracy: 29.9717

Number of Parameters 2: 819457


Hyperparameters 3: {'num_layers': 4, 'num_nodes': [64, 64, 64, 32], 'activat_fun': 'linear', 'dropout': [0, 0, 0.05, 0], 'weight_reg': <keras.src.regularizers.L2 object at 0x7ba99f019360>}

Training Loss: 825.5211, Training Accuracy: 28.7318

Validation Loss: 862.7159, Validation Accuracy: 29.3720

Number of Parameters 3: 415425


We can see based on the above data outputs that the accuracy actually tends to be higher (though only slightly) on the validation sets versus the training sets. Overall, there's a similar accuracy rate to the linear regression model, which is surprising, but could be due to the data input format, or due to the nature of the data's values as atmospheric data, a good portion of which is also predictive in nature. 

Convolutional Neural Network

This code cleaning can also be seen on the Visualization/Cleaning tab. The CNN is trained very similarly to an FNN, but with more emphasis on robust dropout layers. Additionally, the data isn't flattened, but is instead reshaped to approximate the original raw 3D form that we imported it as. A snippet of the resulting vectors, interpreted by colab, is below:

Below there's also an output for the CNN, to demonstrate it in contrast to the FNN:

And here are the accuracy ratings and parameter numbers for each CNN:

Polynesia

Training Loss: 0.0334, Training Accuracy: 0.0561

Validation Loss: 0.0336, Validation Accuracy: 0.0499

Number of Parameters 1: 1512705


Training Loss: 0.0335, Training Accuracy: 0.0531

Validation Loss: 0.0335, Validation Accuracy: 0.0536

Number of Parameters 2: 293497


Training Loss: 0.0337, Training Accuracy: 0.0848

Validation Loss: 0.0334, Validation Accuracy: 0.0762

Number of Parameters 3: 556257

Micronesia

Training Loss: 0.0587, Training Accuracy: 0.1628

Validation Loss: 0.0601, Validation Accuracy: 0.1699

Number of Parameters 1: 1512705


Training Loss: 0.0323, Training Accuracy: 0.0704

Validation Loss: 0.0313, Validation Accuracy: 0.0712

Number of Parameters 2: 293497


Training Loss: 0.0326, Training Accuracy: 0.0826

Validation Loss: 0.0317, Validation Accuracy: 0.0827

Number of Parameters 3: 556257

Melanesia

Training Loss: 0.0511, Training Accuracy: 0.1335

Validation Loss: 0.0538, Validation Accuracy: 0.1469

Number of Parameters 1: 1512705


Training Loss: 0.0334, Training Accuracy: 0.0562

Validation Loss: 0.0323, Validation Accuracy: 0.0562

Number of Parameters 2: 293497


Training Loss: 0.0351, Training Accuracy: 0.0426

Validation Loss: 0.0341, Validation Accuracy: 0.0438

Number of Parameters 3: 556257


One thing to note that isn't shown here is that the CNN models can take on further hyperparameters, potentially increasing the output's accuracy. The hyperparameters aren't shown here due to the constraints of the website's text boxes, as they're quite long strings of parameters.


In terms of answering the question, this shows us that reanalysis of the historical data, despite the earlier similarities seen in the regional graphs, doesn't seem to be very accurate for predicting future analysis. If they were to be significant at all, we would expect around a 0.7 to 0.8 value for accuracy, or at least a 0.5, but most models barely seem to scratch the 0.2 value. It's potentially due to the reshaping of the data, or the limitations of FNN and CNN as 'general' neural networks.

To what extent does climate change influence the magnitude of El Niño and La Niña in a warmer climate? 

This question was answered by indexing the El Niño, La Niña, and ENSO cycle data, and indexing them based on both future and past climatology, in order to create an accurate picture of how these cycles would affect a warmer climate:

We can see that based on future data, we can expect spikes in La Niña cycles for the next two decades, after which the El Niño cycle would take over as more 'dominant' in influencing the climate, likely as the world continues to warm due to climate change, influencing the ENSO cycle towards a stronger El Niño, since that's the part of the cycle that primarily controls warmer years. Meanwhile, if we base these predictions on past data, we see a much more dramatic swing towards El Niño dominance, likely due to the lower temperature values exhibited from past decades, which would make the future data much more drastic in terms of changes. From this, we can predict a future that continues towards a warmer, more unstable climate.

Based on predictive data, what are some socio-economic aspects of the Oceanic regions that could be worsened by climate change? 

This question can't be answered by creating more graphs or models, but by examining our existing data, graphs, and models for signs that point towards what factors might be worsened due to climate change. For starters, climate change interrupts the established pattern of seasons that many regions rely on for growing food or even the growing periods for wild vegetation. Changes in those periods can then influence wildlife, often by depriving them of food. After all, regions that have too many hot days, too many cold days, or days without rain, or too many days with rain, can prevent crops and vegetation from growing that are critical to the region. The climate can be explained by the phrase, 'everything in moderation'. Too much of any one variable, be it wind, temperature, or rainfall, can be devastating. Increased temperature also melts sea ice, increasing sea levels, which are also horrible to oceanic regions, which are characterized by low elevations and large amounts of coastline, which are easily swallowed by encroaching oceans, depriving residents of real estate to live and work in. An erratic climate can also lead to increased storms, typhoons, and other tropical storms, causing damage to necessary infrastructure for the area. A report released by the Center for Global Development reviewed potential socioeconomic issues that could result from climate change in the coming decades and found evidence of threats to economies, agricultural industries, hydrological changes to environments, health issues, and energy use, stating that "There will likely be winners and losers, most probably in the medium term, but in the long term all economies (both developing and developed) might be on the losing end." (Adom, 2024).

To sum it all up, this is a critical issue that we're facing all over the world, and hopefully, the models and graphs we've created help to illuminate that issue, and emphasize the need for future research, more detailed modeling, and applying knowledge to mitigate climate change for the future.

Here's the link to the github for this project, you can also click the button below: https://github.com/LeoNgamkam/Data-Mining-Weather-Patterns 

GitHub