Data

The data I used was from this website: https://www.epa.gov/egrid/download-data

Note that you will need to download two files from this website: eGRID2018v2 Data File (XLSX) and eGRID historical files (1996-2016) (ZIP). For the 2018 data file, we are specifically taking a look at the PLNT18 tab, and for the historical files, we are only taking a look at the 2000 and 2010 files. The 2000 file is called eGRID2000_plant.xls and the 2010 file is called eGRID2010_Data.xls. For the 2000 file, we are specifically looking at the EGRDPLNT00 tab and for the 2010 file, we are looking at the PLNT10 tab.

This is what the 2018 original data file should look like (the 2000 and 2010 data files look similar to this):

2018 data file

It is important to note that these data files are extremely huge, so I did a bit of preprocessing and cleaning in Python before loading the files into R. The Python script to do this is located in my Github repository. Note that the script will NOT work the same for all of the files, but you can just replace the file name and tab name at the beginning, and then just replace the column names with the ones you need. I loaded the file in and then I picked the columns I needed. Then, I renamed the columns and converted it into a CSV file. I recommend using Jupyter Notebook for this.

After cleaning the data, I loaded the files into R and did some mathematical computations such as finding the totals and percentages for certain energy sources. I also removed any NA values in the energy source columns. I highly recommend you use RStudio to do this.

Page updated

Google Sites

Report abuse