# 12 Plotting Variables in R

## Plot Single Discrete / Qualitative Variables

You can plot discrete or qualitative variables using the following techniques

• barplot
• pie (though, it is not a good charting method)

We are going to diamonds dataset in ggplot2 package for illustration purpose

``require(ggplot2)``
``data(diamonds)``

Summarize data ... find frequency for each color of diamond

``diamond.colors = table(diamonds\$color)``

Simple barplot

``barplot(diamond.colors)``

Order the barplot

``diamond.colors = diamond.colors[order(diamond.colors, decreasing = TRUE)]``
``barplot(diamond.colors)``

Create a palette of 7 colors from RColorBrewer.

``require(RColorBrewer)``

blues = brewer.pal(7, "Blues")

Use the color palette to the barplot. rev function reverses the color palette values

``barplot(diamond.colors, col = rev(blues)) ``

Tidy up the graph a little bit

``Set the plot parameters``
``par(ama = c(1, 1, 1, 1)) # ama: outside margin``
``par(mar = c(4, 5, 2, 1)) # mar: margin``
``barplot(diamond.colors, ``

`col = rev(blues), # Color of the bars `

`horiz = TRUE, # Putting the label values horizontally`

`las = 1, # Orientation of x-labels`

`border = NA, # No borders on bars`

`main = "Frequencies of Different Colors of Diamond", # title of the graph`

``        xlab = "Number of observations", # label of chart along x axis ylab = "Color of Diamond" # label of chart along y axis) `` ## Display Categorical Variable using Pie Chart

• Not Recommended, rather use barchart, see why below)
``pie(diamond.colors, col = blues)`` It is hard to tell the relative measures from pie chart, while the bar chart clearly shows the difference. Below is a text from the help text on pie function.

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

## Plot Quantitative or Continuous Variables

You can plot continuous variables or quantitative variables using the following

• histogram
• boxplot

## Histogram

``prices = diamonds\$price``
``hist(prices, col = "orange")`` Specify number of bucket you want to create across x axis (... that contains the values of the continuous variable)

``hist(prices, col = "orange", breaks = 100)`` Plot density or relative frequency

``hist(prices, col = "orange", breaks = 100, freq = FALSE)`` Add a normal distribution curve to the histogram

``curve(dnorm(x, mean = mean(prices), sd = sd(prices)), col = "darkblue", lwd = 2, add = TRUE) `` ## Boxplot

Boxplot is useful to outliers and symmetry in the distribution. For this illustration, let's use iris dataset that comes with R.

``data(iris)``
``str(iris)``

Take a subset iris dataset

``virginica = iris[iris\$Species == "virginica", ]``

Simple boxplot

``boxplot(virginica\$Sepal.Length)``

## Putting together boxplot, histogram and normal curve on same plot

``> carats = diamonds\$carat``
``hist(carats, col = "Lightgrey", breaks = 100, freq = FALSE)``
``> boxplot(carats, col = "orange", horizontal = TRUE, add = TRUE)``
``> curve(dnorm(x, mean = mean(carats), sd = sd(carats)), col = "darkblue", lwd = 2, add = TRUE)`` 