Data Visualization

Note: This section is in progress. Commonly used graphs with coding examples from my research papers are included below.

Introduction to ggplot

One of the few compliments that Stata users give R is that the graphs look great. This is in large part due to the ggplot2 package. This section gives a quick introduction to the structure of ggplot. First, we need to install the package. As before, type install.packages("ggplot2") in the console. I also like to use the package ggthemes to make my graphs look pretty, so you'll need to install that as well.


Note: This section is in progress. Commonly used graphs with coding examples from my research papers are included below.

Scatterplot (geom_point)

Figure 2 from "Temporary Sales in Response to Demand Shocks'' (with Ben Eden and Maya Eden)

Note: Simplified for main parts.

ggplot(data, aes(y = sale, x = Units_sd)) +

geom_smooth(aes(color = 'blue line'), method = 'lm', size = 1.2, se = TRUE) +

geom_point(aes(color = 'red points'), alpha = 0.5) +

scale_color_stata() +

guides(color = FALSE) +

labs(x = 'Std. Dev. of Log Units', y = 'Sale Frequency') +

theme_tufte(base_size = 20)

Line Graph (Simple, geom_line)

Panel (a) of Figure 3 from "Stay-at-Home Orders in a Fiscal Union" (with Mario Crucini)

Note: Simplified for main parts.

ggplot(data) +

geom_line(aes(x = EventTime, y = Estimate, color = 'Stay-At-Home Order'),

size = 1.2) +

geom_line(aes(x = EventTime, y = EstLB, color = 'Stay-At-Home Order'),

size = 1, linetype = 2, alpha = 0.7) +

geom_line(aes(x = EventTime, y = EstUB, color = 'Stay-At-Home Order'),

size = 1, linetype = 2, alpha = 0.7) +

geom_vline(aes(xintercept = -1), size = 0.7) +

guides(color = FALSE) +

scale_color_stata() +

labs(x = 'Days Relative to Treatment', y = 'Per. Pt. Change') +

theme_hc()


Line Graph (Group Averages, stat_summary)

Figure from "Economic Impact of Black Lives Matter Protests" (with Craig Sylvera)

Note: Simplified for main parts.

ggplot(data, aes(x = Date, y = bus_open, group = any_violence,

color = factor(any_violence), fill = factor(any_violence))) +

stat_summary(fun.data = mean_se, fun.args = list(mult = 1),

geom = 'ribbon', aes(color = NULL), alpha = 0.15) +

stat_summary(fun.data = mean_se, geom = 'line', size = 1.5) +

geom_vline(xintercept = as.Date('2020-05-25'), size = 1) +

geom_hline(yintercept = 0, size = 0.8) +

scale_color_stata(labels = c('No', 'Yes')) +

scale_fill_stata(labels = c('No', 'Yes')) +

labs(y = 'Percentage Point Change from May 25th', x = '', fill = 'Violent Protest') +

guides(color = FALSE) +

theme_hc()


Histogram (geom_histogram)

Panel (b) of Figure 2 from "Selection Effects in Retail Chain Pricing"


Note: Simplified for main parts.

ggplot(data) +

geom_histogram(aes(x = var_chainweek, fill = 'chainweek'), color = 'black',

size = 1, alpha = 0.2, bins = 15,

show.legend = FALSE) +

scale_fill_stata() +

labs(y = 'Count', x = expression(paste('Chain-Week Variance (', gamma, ')'))) +

theme_hc()