Gas Consumption (MGTCP) and Air Quality (AQI) per State
Maggie Mullen, Emily Bette, Jason Zysk, and Andrew Reece
PSYC 500 final project
Maggie Mullen, Emily Bette, Jason Zysk, and Andrew Reece
PSYC 500 final project
According to the EPA, motor transportation is the number one contributor to greenhouse gas emissions in the United States. Motor transportation can include civilian vehicles, trucks, tractors, busses, eightneen wheelers, trains, aircraft, and many more. The EPA reported that "light-duty vechicles" are the largest contribtor to greenhouse emissions out of all of the motor vehicles - at around 60% of motor vehicle emissions. https://www.epa.gov/greenvehicles/fast-facts-transportation-greenhouse-gas-emissions
Very relevant problems that relate to greenhouse gas emissions inlude climate change, smog, public health, and possible lower air quality.
The International Journal of Environmental Research and Public Health reported how fossil fuel conbustion, such as motor gas consumption, can lead to impairment of cognitive and behavioral development, respiratory illness, and other chronic diseases in children and the elderly in their journal Pollution from Fossil-Fuel Combustion is the Leading Environmental Threat to Global Pediatric Health and Equity: Solutions Exist.
Some limitations to this study is that correlation does not neccesarily mean causation. There could have been many confounding variables in this study related to overall public health. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5800116/
The Environmental and Energy Study Institute published Review Finds No Consensus Among Fuel Studies – Major Implications for Health, Air Quality where they discuss that there is no consensus in the literature regarding the health effects of ethanol-blended fuel. This article is the rebuttal to the article above and it is very important to read both sides of the argument.
Some limitations to this study could include inaccurate reportings of health records and data related to overall gas consumption and air quality.
Figure 1: Precursor: Table of Air Quality Index Scale and how it relates to public health
Link to our Collab to follow along: https://colab.research.google.com/drive/1wGrfjdZM6oWr2dXnAI7yK_FKDeA1PLnl#scrollTo=96Hyf5nLvyPN
One of our sources was the United States Environmental Protection Agency (EPA). The EPA is a governmental agency that was created in order to help create a cleaner and healthier environment for American citizens. The EPA strives towards conservation, cleaner air, better water quality, control of pesticides, and other potentially harmful things to the environment or citizens.
Another source we used was the United States Energy Information Administration, or EIA. The EIA collects and analyzes energy information to contribute to a better understanding of energy consumption and its relationship with the economy and environment. Data from the EIA is commonly used in during formation of new policies and the development of efficient markets
Data and analyses from national agencies such as the EPA and the EIA are trusted to report accurate and concise information to shape the decisions of governments, businesses, and organizations. Therefore, some ethical considerations regarding the collection of this data includes ensuring the numbers are accurate and measured in a consistent manner that would not cause the misrepresentation of results. For this reason, federal statistical agencies, such as the EIA, are considered “principal statistical agencies” due to the high standard of collection and analyses to maintain quality, integrity, and credibility, and prevent any unethical activity or reporting.
Both the EPA and the EIA are involved in maintaining this ethicality includes transparency in methodology, objectivity of analyses, independence from government, and trust and credibility of information sources.
EPA
- Monitoring data was collected using hourly and daily measurements from state environmental agencies reported to EPA's Air Quality System.
- Emissions data was collected using source readings and mathematical estimates from state environmental agencies reported to EPA.
EPA Air Quality Testing Centers are determined based off of core-based statistical area (CBSA). The location of each CBSA is determined by the Office of Management and Budget and consists of one or more counties anchored by an urban center of at least 10,000 people plus adjacent counties that are socioeconomically tied to the urban center by commuting.
EIA
-EIA-sponsored surveys using consistent and professional methodologies were used to gather information
-EIA surveys and information systems are documented and explanatory materials are made available for EIA information products
The EIA collects and analyzes energy information to contribute to a better understanding of energy consumption and its relationship with the economy and environment. Data from the EIA is commonly used in during formation of new policies and the development of efficient markets.
EIA's activities in the creation, collection, maintenance, and dissemination of information include:
1. Developing concepts and methods
2. Planning and designing surveys and other means of collecting data
3. Collecting data
4. Processing and editing data
5. Analyzing data
6. Producing estimates and projections
7. Reviewing information products
8. Disseminating information in published reports, electronic files, and other media requested by users.
We chose to analyze data from the EPA and EIA because we all care for the environment and believe that it is a relevant topic in today’s society. We were curious to better understand the impact of gasoline usage on air quality and observe the changes in these levels over time as well as location, based on population. We are using data from three different decades, 2000, 2010, and 2018, to be able to better see the long-term trends of air quality and gasoline usage. By separating this data by state, it allows us to also consider the effect of location. By better understanding how these variables contribute to changes in air quality, we can then analyze our results and discuss how they could translate into future plans to improve air quality and quality of life.
Variables in the data frames and the data type of each variable:
Air Quality Index (AQI): described by Figure 1 above. Recorded for each core-based statistical area (CBSA) which is defined by the Office of Management and Budget and consists of counties surrounding a 10,000 population urban center.
continuous variable
Motor Gas Total Consumption (MGTCP): Motor gas used by state
continuous variable
Population: Population of US states
continuous variable
Year: data was collected on all variables during the year 2000, 2010, and 2018, in order to visual a trend across decades.
discrete variable
State
discrete variable
EPA datasets including AQI data:
2000 dataset: 19 columns and 1,113 rows, totaling 21,147 cells
2010 dataset: 19 columns and 1,083 rows, totaling 20,577 cells
2018 dataset: 19 columns and 1,056 rows, totaling 20,064 cells
Polulation dataset from US Census Beuro with County and State:
3144 columns and 5 rows; 6,288 cells
EIA dataset including motor fuel consumption:
7,855 columns and 62 rows, totaling 487,010 cells
Final Merged Data Frame for 2000
Final Merged Data Frame for 2010
Final Merged Data Frame for 2018
Final Descriptive Statistics for Each Year:
From the descriptive satistics of each year shown above, one can conclude that the average Air Quality in 2018 actually got better since 2000. This is intresting based off the fact that motor gas consumption and population have increased by a significant amount.
CDF Plots for Air Quality, Gas Consumption, and Mean Population
CDF Plot for Mean AQI
CDF Plot for Mean Gas Consumption
CDF Plot for Mean Population
Permutation Hypothesis Test for AQI
Also did Permutation Tests on Gas Consumption and Population (see Collab for more details)mean difference = 3.11
Air Quality vs. Population
A weak correlation shown by the linear regression line and surrounding dots (not close to the line)
Observed correlation between AQI and Population: .24
p-value = 0.0022
.17% CHANCE WE OBSERVE A .24 Corr or higher if AQI and POP are not related
Air Quality vs. Gas Consumption
A weak correlation shown by the linear regression line and surrounding dots (not close to the line)
Observed correlation between AQI and Population: .28
p-value = 0.0003
.04% CHANCE we observe .28 corr or higher if AQI and MGTCP are not related
The purpose of this project was to explore the relationship between air quality index (AQI) and motor gas total consumption (MGTCP) between different locations and times. By comparing different states and consecutive decades, it gives an idea of how population affects these variables. Datasets were used from the US Environmental Protection Agency for air quality data, US Energy and Information Administration for gas consumption data, and US Census Bureau for population data. These are all national agencies that follow strict ethical guidelines for data collection and analyses.
To prepare the data for analyses, we merged the information on AQI, MGTCP, and population from the separate data sources into a single dataset, organized by state. Next, we performed descriptive statistics and visualized the distribution of each of these variables using bar plots as well cumulative distribution functions (CDF) for AQI, MGTCP, and population for each of the three decades as well as for all the decades combined. We also performed permutation testing with the data from different decades combined, and a linear regression plot between AQI and MGTCP.
The CDF for AQI showed about 50% of AQI recorded in 2000 and 2010 was at 40 or less, with a change in 2018 to 50% of AQI recorded at 35 or less. This demonstrates a slight downward shift in AQI from 2000 and 2010 to 2018. The CDF for MGTCP showed about 50% of MGTCP recorded in 2000, 2010, and 2018 at 50,000 thousands of barrels or less indicating no major changes through the decades. The CDF for population showed about 50% of state populations recorded in 2000 is around 2.5 million or less, while in 2010 50% is around 3 million or less, and 2018 50% is around 3.5 million or less. This reflects an overall increase in population through the three decades.
The results of the permutation testing for AQI showed that about 35% of areas were measured at an AQI of 35 or less, and in these areas no change was seen in AQI from 2000 to 2018. In 2000, the highest AQI recorded was roughly 45, while in 2018, the highest AQI recorded was roughly 52. This demonstrates that after an AQI of 35, the difference in AQI scores between 2000 and 2018 increases with higher scores. However, this increase is not sufficient to indicate a statistically significant difference because the mean difference was 3.69 resulting in a p value of 0.9948. This p value means we failed to reject the null hypothesis, and there was no statistically significant difference in AQI between 2000 and 2018. The permutation testing for MGTCP showed no significant difference in the graph between 2000 and 2018. This is reflected in a mean difference of -6324.40 resulting in a p value of 0.3307. This p value means we failed to reject the null hypothesis, and there was no statistically significant difference in MGTCP between 2000 and 2018. The permutation testing for population showed no significant difference in the graph between 2000 and 2018. This is reflected in a mean difference of -697092.10 resulting in a p value of 0.2923. This p value means we failed to reject the null hypothesis and there was no statistically significant difference in population between 2000 and 2018.
The linear regression plot between AQI and population showed a weak positive correlation with an observed correlation of 0.24. The linear regression plot between AQI and MGTCP showed a weak positive correlation with an observed correlation of 0.28.
Limitations include the number of days each facility collected data on AQI. Data collection did not occur 365 days a year which could lead to inaccurate representations of air quality because of missed days. Another limitation is the increases in combustion engine efficiency. Over the 18 year span, vehicles have become more efficient in order to reduce pollution. Because we looked at the total gas consumption, it doesn’t account for how much pollution was produced per gallon of gas. This could mean that additional factors played a role in AQI. These factors are another limitation. Because states may share borders with larger states, the larger states’ gas consumption may affect the AQI of neighboring states. Air pollution affects everyone, making it important to know the impacts of not just polluting state, but it’s neighboring states as well.
Future research should factor in increases in efficiency and quantify the changes in pollution created by gas usage over time. Research should also try to factor in the increased use of electric vehicles into their research in motor vehicle transportation. This research should look at how the electricity for electric vehicles is produced and factor the consumption of electricity into the pollution caused by electric vehicles.
By finding the average air quality per state throughout time in relation to gas consumption, the data can be given to policy makers on the national and state level. On the national level, there can be more federal laws put into place that will try to decrease gas consumption to hopefully increase air quality. On the state level there can be more recommendations/mandates and even laws on how to lower transportation needs, increase carpooling, implement more renewable energy sources, and lower the total amount of gas consumption. As a community, people can come together in order to increase the quality of the air everyone lives in and breaths. There can be community education centers, carpooling centers, bikes for rent, or even walking clubs! Air quality is something that many people take for granted. This data shows that there are discrete variables that can be taken into account in order to hopefully create a clear air quality for all. One of the main issues is most people do not realize they are directly influencing the air quality. By showing policy makers and communities data like this, it is very easy for them to objectify what is happening and hopefully strive to make changes for the better. In order to help improve the air quality conditions, there has to be a direct impact that comes from policy and overall community interventions as discussed above.