Environmental protection is the practice of protecting the natural environment by individuals, organizations and governments. Environmental protection is one of the basic prerequisites for the overall development of any country in the world.
Air and water pollution, global warming, deforestation, wildfires are just few of the environmental issues that we are facing right now. Reasons why we should protect the environment.
The environment helps in protecting the ecosystem. Changes affecting the ecosystem put multiple species in the danger of extinction
Forests provide habitats for a variety of species and raw materials for various consumer products.
The environment is useful in maintaining the earth’s balance.
The main objective of this project is to identify the sentiment of people in the U.S. towards environmental protection and how it varies with respect to regions.
The conclusions of the study can be useful to government and non-profit organizations that have initiatives related to the environment and to facilitate decision-making. This also serves as a basis to bring awareness among people regarding environmental protection.
Understanding people's emotions allows the U.S. law makers to listen attentively to their people, and formulate laws to meet their needs.
In the wake of current 2020 presidential elections, this study could help both the parties to identify the areas that have scope for improvement related to environment and come up with new proposals to win the elections.
The U.S. Government's Role in Environmental Protection:
Beginning in the 1960's, the people of America became increasingly concerned about the environmental impact of industrial growth.
The Clean Air Act of 1963 was the first federal legislation regarding air pollution control.
The Clean Water Act (CWA) of 1972 establishes the basic structure for regulating quality standards for surface waters.
The Endangered Species Act of 1973 is the primary law in the US for protecting endangered species.
The Safe Drinking Water Act (SDWA) of 1974 was established to protect the quality of drinking water in the U.S.
In December 1970, environmentalists achieved a major goal with the establishment of the U.S. Environmental Protection Agency (EPA).
Companies towards Eco-friendly initiatives:
Some of the initiatives being taken by companies towards environment protection.
Established carmakers around the world are moving towards electric cars production as they emit less greenhouse gases and air pollutants over their life than a petrol or diesel car.
American businesses such as Amazon, Macy's, IKEA are investing record amounts in solar power, as they help combat greenhouse gas emissions.
Recycling helps reduce greenhouse gas emissions, which helps to tackle climate change. Many US Companies are crafting their products from the recycled plastics.
1. Is there any difference in people’s attitude towards the environment based on regions?
2. Does the expression of people's sentiments and opinions vary between digital print and social media.
Digital Print media data:
News articles from five top newspapers that cover all the major regions of the US. Below are the newspapers that will be used for the project.
New York Times
Los Angeles Times
The Washington Post
Chicago Tribune
Denver Post
Social Media data:
Data taken from Facebook and Twitter related to environmental protection.
Environmental protection has always been practiced by humans in one form or another. However, as human induced pressures on the environment have escalated over the past century, the need for systematic environmental protection has increased. This has led to considerable experimentation with the domestic and international measures that are used to achieve environmental protection objectives. A research performed by Kyle Van Houtan, chief scientist at Monterey Bay Aquarium and team used sentiment analysis to assess the successes and failures of wildlife conservation over time. They assessed the abstracts of more than 4,000 studies of species reintroduction across four decades using Natural language processing to extract usable information and found that we're getting better and better at reintroducing species to the wild.
Another study performed by Reyes-Menendez, Saura, & Alvarez-Alonso (2018) focused on understanding of #WorldEnvironmentDay user opinions in Twitter. They have downloaded the 5873 tweets that used the hashtag #WorldEnvironmentDay on the respective day and performed sentiment analysis to group them according to the expressed feelings. The results of their analysis established the key factors that most concern users about the environment and public health.
Unlike the above mentioned studies that used data from abstracts or social media data to perform sentimental analysis, this project uses digital print media data as well in addition to social media data that is related to environmental protection. Sentiment analysis will be performed to evaluate the attitude of people towards environmental protection and look for the trends with respect to regions. Finally we will identify any differences in the expression of people's sentiments and opinions between digital print and social media.
Text extracted from the news articles and social media is preprocessed by implementing tokenization, removal of punctuation, non-alphabetic tokens, single character words, stop words removal and lemmatization. Wordcount and file size are used to compare and ensure that the Digital Print media and Social media datasets are balanced. The cleaned text thus obtained is used to calculate sentiment scores using Vader library. VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. Fig.1 shows the overview of source data.
The exploratory data analysis is performed using Python and implemented data visualization using Matplotlib and Plotly libraries that show interactive plots.
Fig.2, Fig.3(a) and Fig.3(b) show bar graph and Choropleth Plotly maps of region wise trend.
Fig.4 and Fig. 6 show bar graph of month wise trend for each region and wordcount trend of social media data with drop down menu for each plots.
Comparison between social media and Digital Print media is shown in Fig.5
Please click on the figure names to see interactive plots, they connect to html pages that show more information about data points by hover text.
Fig.2 shows the region-wise trend of people's sentiment towards environmental protection for the five regions Chicago, Denver, Los Angeles, New York and Washington DC.
From the figure, Los Angeles has highest positive sentiment and Chicago has highest negative sentiment.
Most of the positive sentiment of Log Angeles region is contributed by news articles related to transition towards clean energy.
Chicago region's negative sentiment is contributed by news related to ethylene oxide emissions and the impact of Coronavirus pandemic.
Trend of positive and negative sentiment is shown using Choropleth maps in Fig.3(a) and Fig.3(b)
Fig.4 shows month wise trend, Chicago region's negative sentiment includes the news of environmental agency’s workforce layoffs and bypass of environmental rules in pandemic
Negative sentiment of New York includes the regulations of climate rollback.
Los Angeles commitment to zero carbon emissions by 2050 and achieve the goals of Paris Climate agreement adds towards positive sentiment.
Denver's positive sentiment is made up by the news that at least 5% of automakers’ vehicles available for sale by 2023 be electric.
Negative sentiment of Washington region is contributed by weakened regulations that would have forced coal plants to treat wastewater.
Fig.5 shows comparison of sentiment with digital print media being more neutral compared to social media.
Month wise trend of sentiment and word count of social media is shown in Fig.6
The surge in the word count for Sep and Oct months of 2020 shows the increase in the importance of the 'environment' topic during the current presidential elections debates.
The nonparametric tests are used to determine whether two or more data samples have the same or different distributions. I have applied the tests to confirm that the sentiment is different for different regions. The hypotheses for the test are:
p <= alpha: reject H0, different distribution.
p > alpha: fail to reject H0, same distribution.
Kruskal Wallis test is used to compare two or more samples, whereas Mann-Whitney test does the pair-wise comparison of samples.
We got p value < 0.05 for both the tests (shown above) which confirms that the sentiment is different based on regions.
Build and compare 5 machine learning models using the Chicago region data and select the one with best performance.
Build Logistic Regression model with C value 0.05
Train using month wise data of Chicago region
Plot the heat map of the model to show TP, FP, TN, FN values and accuracy of model is 60%
Build the SVM classifier with linear kernel
Create the classification report to show the performance metrics.
Accuracy of the model is 60%
Build the model
Using time series generator
Sequential for initializing the neural network
LSTM for adding the Long Short-Term Memory layer with 100 units and dense layer with Sigmoid activation
Compile the model
binary_crossentropy loss, adam optimizer and accuracy metrics.
Preprocess using MinMaxScaler and fit the model with 50 epochs.
Accuracy obtained 66.7%
Build the KNN model with K value ranging from 1 to 8 for comparison.
Plot the relationship between K and the testing accuracy.
Pick the K value with highest accuracy, K = 3
Train and evaluate the performance. Accuracy obtained is 75%.
Build the Random Forests model with n_estimators=100
Evaluate the performance, plot the ROC Curve for the model. roc_auc_score for Random Forest model is 0.75
Accuracy of model is 80%.
Forecast sentiment is negative for Chicago, NY and Washington regions and positive for Denver and LA regions. It is positive for Twitter and Facebook data. Plots for all the regions are available in Github
Include news articles and social media data related to climate change, global warming, extreme weather, water/air pollution, deforestation, wildlife and so on.
The data extraction can be done on weekly basis instead of monthly that would help in assessing the sentiment of regions more accurately
Number of regions for analysis can be increased or extend to state level analysis.
Develop a web based dashboard that shows sentiment scores of each region, that can be accessed by anyone online.
With additional source data, more robust machine learning models can be built and forecast more accurately.
References:
Van Houtan et al. Sentiment analysis of conservation studies captures successes of species reintroductions. Patterns, 2020 DOI: 10.1016/j.patter.2020.100005
Reyes-Menendez, Ana & Saura, José & Alvarez-Alonso, Cesar. (2018). Understanding #WorldEnvironmentDay User Opinions in Twitter: A Topic-Based Sentiment Analysis Approach. International Journal of Environmental Research and Public Health. 15. 10.3390/ijerph15112537.
Brownlee, J. (2019, August 8). How to Calculate Nonparametric Statistical Hypothesis Tests in Python. Machine Learning Mastery. https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/