Data Visualization
Note: This section is in progress. Commonly used graphs with coding examples from my research papers are included below.
Introduction to ggplot
One of the few compliments that Stata users give R is that the graphs look great. This is in large part due to the ggplot2 package. This section gives a quick introduction to the structure of ggplot. First, we need to install the package. As before, type install.packages("ggplot2") in the console. I also like to use the package ggthemes to make my graphs look pretty, so you'll need to install that as well.
Note: This section is in progress. Commonly used graphs with coding examples from my research papers are included below.
Scatterplot (geom_point)
Figure 2 from "Temporary Sales in Response to Demand Shocks'' (with Ben Eden and Maya Eden)
Note: Simplified for main parts.
ggplot(data, aes(y = sale, x = Units_sd)) +
geom_smooth(aes(color = 'blue line'), method = 'lm', size = 1.2, se = TRUE) +
geom_point(aes(color = 'red points'), alpha = 0.5) +
scale_color_stata() +
guides(color = FALSE) +
labs(x = 'Std. Dev. of Log Units', y = 'Sale Frequency') +
theme_tufte(base_size = 20)
Line Graph (Simple, geom_line)
Panel (a) of Figure 3 from "Stay-at-Home Orders in a Fiscal Union" (with Mario Crucini)
Note: Simplified for main parts.
ggplot(data) +
geom_line(aes(x = EventTime, y = Estimate, color = 'Stay-At-Home Order'),
size = 1.2) +
geom_line(aes(x = EventTime, y = EstLB, color = 'Stay-At-Home Order'),
size = 1, linetype = 2, alpha = 0.7) +
geom_line(aes(x = EventTime, y = EstUB, color = 'Stay-At-Home Order'),
size = 1, linetype = 2, alpha = 0.7) +
geom_vline(aes(xintercept = -1), size = 0.7) +
guides(color = FALSE) +
scale_color_stata() +
labs(x = 'Days Relative to Treatment', y = 'Per. Pt. Change') +
theme_hc()
Line Graph (Group Averages, stat_summary)
Figure from "Economic Impact of Black Lives Matter Protests" (with Craig Sylvera)
Note: Simplified for main parts.
ggplot(data, aes(x = Date, y = bus_open, group = any_violence,
color = factor(any_violence), fill = factor(any_violence))) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1),
geom = 'ribbon', aes(color = NULL), alpha = 0.15) +
stat_summary(fun.data = mean_se, geom = 'line', size = 1.5) +
geom_vline(xintercept = as.Date('2020-05-25'), size = 1) +
geom_hline(yintercept = 0, size = 0.8) +
scale_color_stata(labels = c('No', 'Yes')) +
scale_fill_stata(labels = c('No', 'Yes')) +
labs(y = 'Percentage Point Change from May 25th', x = '', fill = 'Violent Protest') +
guides(color = FALSE) +
theme_hc()
Histogram (geom_histogram)
Panel (b) of Figure 2 from "Selection Effects in Retail Chain Pricing"
Note: Simplified for main parts.
ggplot(data) +
geom_histogram(aes(x = var_chainweek, fill = 'chainweek'), color = 'black',
size = 1, alpha = 0.2, bins = 15,
show.legend = FALSE) +
scale_fill_stata() +
labs(y = 'Count', x = expression(paste('Chain-Week Variance (', gamma, ')'))) +
theme_hc()