Publication Quality Plots in R Using ggplot2

BEFORE YOU BEGIN:

The videos below will walk you through the following .rmd file:

Be sure you already have the datafile from Ch. 9 saved.

This chapter includes reviews from previous chapters on plotting on ggplot2. You can download the data file and .Rmd files in each section to review.

22–A: Introduction and Learning Outcomes

ggplot2 is a plotting package that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot. This helps in creating publication-quality plots with minimal amounts of adjustments and tweaking.

ggplot2 is included in the tidyverse package. In previous chapters, we installed the tidyverse package and learned how to "tidy" our data to get it into the correct format (for review see Chapter 15). As a reminder, ggplot2 (and other tidyverse functions) work best with data in the ‘long’ format, i.e., a column for every dimension, and a row for every observation. Well-structured data will save you lots of time when making figures with ggplot2!

In this module you will learn how to:

  1. Produce scatter plots, bar plots, and boxplots using ggplot.

  2. Set universal plot settings.

  3. Describe what faceting is and apply faceting in ggplot.

  4. Modify the aesthetics of an existing ggplot plot (including axis labels and color).

  5. Build complex and customized plots from data in a data frame.

22–B: The Importance of Data Visualization

Once you understand your data, you need to communicate your understanding to others. Making visuals and graphics that are self-explanatory, informative and pretty to look at takes time and effort. In this chapter, you’ll learn some of the tools that ggplot2 provides to make these visually appealing and informative visuals.

David McCandless recently spoke at a TedX event about his work on data journalism, and makes a persuasive case for paying more attention to visualization:

https://www.youtube.com/watch?v=5Zg-C8AAIGg&t=1069s

22–C: General Format of Plots in ggplot2


ggplot graphics are built layer by layer by adding new elements. Adding layers allows for extensive flexibility and customization of plots.

To build a ggplot, we will use the following basic template that can be used for different types of plots:


ggplot(data = <DATA>, mapping = aes(<x>, <y>)) + ...

Note ggplot( ) is a function that takes several inputs. The bolded code above is the minimum amount of information required to make a plot. However, it is often quite useful to add layers.


In the format above, the text in between the following symbol < > is what you enter that is specific to your data and choices.

data = the name of your datafile

aes = aesthetic mapping. aes usually will take as inputs your x and y data


You may want to give your plot a name.

plot_example <- ggplot(data = <DATA>, mapping = aes(<x>, <y>))


A commonly added layer is the <GEOM_FUNCTION>()

geom_functions = a function commonly added to ggplot2 that allows a geometric layer to be added. Used commonly for lines, boxplots, and dotplots.

If we wanted to add a geometric layer to our base above, we will have code that may look like the following:

plot_example + geom_boxplot(<>)


The RCheatsheet is your friend for making pretty plots with ggplot2. Also googling the plot you are interested in followed by ..AND ggplot2 can be really helpful. Finally, there are examples throughout this book on using ggplot2 for the anlyses covered.


Illustration by Allison Horst

22–D: Plotting Correlations

We previously included code and videos on plotting correlations in Ch. 11

22–E: Adding a Regression Line

We previously included code and videos on plotting correlations in Ch.12


22–F: Box Plots

We previously included code and videos on plotting correlations in Ch. 16


References

  1. Copyright (c) Data Carpentry Data Carpentry, 2014-2021.https://datacarpentry.org.


Interested in learning more?

1.https://ggplot2.tidyverse.org/

2. The R Graphics Cookbook https://r-graphics.org/

3. Ch. 28 in R for Data Science focuses on "Graphics for Communication": https://r4ds.had.co.nz/graphics-for-communication.html

4. More R instructional guides: https://www.statology.org/r-guides/ © 2021 Statology

5. http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/