Once I had a useable dataset, I started coding in R to clean and make data visualizations. The graphics were key for me to look at trends and see dips and peaks in the data. I used a Jupyter Notebook to store my work and code in R, and all of my data visualizations were made with ggplot2.
Before using ggplot2 to create the graphics, I needed to clean the data so that it could be used with R. I started with two .csv files (1970to1984.csv and 1985toNow.csv) and used rbind to create a complete dataset from 1970-2018.
This is what the heading of "fulldataset" looked like:
This format would not work with ggplot2 and did not have the correct labeling and subsetting, so after various changes, this is what the final dataset, "chrono," looked like:
YEAR: The year that the statistic corresponds to
variable: The month and type of trade (...E for exports and ...I for imports) of the value
value: Seasonally unadjusted and unadjusted for inflation value in millions of US dollars
CPI: The Consumer Price Index value of that year to use for adjusting the value for inflation.
Month_No: Because R is best used with integers, I changed the months to numbers (Jan-Dec == 1-12)
AdjustedValue: The value adjusted for inflation using the CPI value. In 2018 USD.
MONTH: The month that the data corresponds to.
TickNames: The month that the data corresponds to but with the year included with January (1971 JAN, 1972 JAN, etc.). I created this to use with some of the graphics so that the x-axis labeling was not overly cluttered.
Date: Because R prefers numerical labeling, this is the month and year data in integer form.
Type: Numerical differentiation of type of trade – 1 is exports and 2 is imports.
JanJulTickNames: Another column that I created to use for labeling on some of the graphics. Uses January and July and the year to label data.