I illustrate methods for displaying data using the built-in dataframe ¨ChickWeights¨, which contains weights of individual chicks over time fed different diets. This script file contains the command lines below.
The data are obtained by the data() function
> data(ChickWeight)
I use a short-hand name or alias for conveniences
> chicks<-ChickWeight
Viewing / printing data
There are several ways the data can be displayed. The print() function displays the entire frame, as does simply typing in the dataframe name
> print(chicks)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
etc.
or
>chicks
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
etc.
Alternatively, many types all that is needed is a peak at the first or last few lines:
> head(chicks)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
> tail(chicks)
weight Time Chick Diet
573 155 12 50 4
574 175 14 50 4
575 205 16 50 4
576 234 18 50 4
577 264 20 50 4
578 264 21 50 4
>
Alternatively, a handy tool exists within R Studio that allows display and sorting of data. In the environment window click on the gridded box to the far right (see arrow). The dataframe pops up in the upper left window. This can be used much like a spreadsheet, clicking on columns to sort, and even used as an editor (however changes will only be saved if directed to a save object, see later discussion on writing/saving data).
Built- in graphing functions
A large number of built-in functions exist in R for generating graphs. I'll illustrate a few of them here to give you a feel.
Histograms
The hist() function can be used to generate histograms, for instance for the chick weights. The commands
> with(chicks,hist(weight))
> #finer cells
> with(chicks,hist(weight,breaks=20))
> #Box plots
produce first a coarser- and then a finer-scaled histogram.
Box and whisker plots
The boxplot() function provides a great deal of information quickly and is designed to plot box and whisker plots by specified groups read from the data. The information in each plot contains
The boxplot() function can be run with a data argument to specify the input dataframe, or using the with() function:
> boxplot(weight~Diet,data=chicks)
> #or
> with(chicks,boxplot(weight~Diet) )
This displays plots by each diet group
I couldn't find an automatic way to do a single plot for all the data, so I created a fake group that is the same for all the data:
> chicks$dum<-1
> boxplot(weight~dum,data=chicks)
Box and whisker plots provide a quick way to visualize the data, and also to perform simple diagnostics, such as identifying heterogeneous variances, asymmetric distributions, and outliers (all evident for these data).
Scatter plots
Scatter plots show relationships between 2 variables in a data set, and are provided by the plot() function. The standard input for a plot function is the data described by horizontal axis x and vertical axis y, so plot(x,y) where x and y are either directly provided as lists of numbers, or obtained from dataframe. To take a simple example where there is an obvious relationship between x and y
> x<-c(1,2,3,4,5)
> y<-3*x
> plot(x,y)
By default plot() displays only the specific data points given to the function (4 points in this example)
To display the points connected by a straight line (not always a good idea!) we add an option to the function
> plot(x,y,type="l")
and to display both the line(s) and the points
> plot(x,y,type="b")
For a less trivial example, let's take the chick weight data; for a clearer picture we'll subset the data to take just the weight over time for a single chick for Diet 1:
> data(ChickWeight)
> chicks<-ChickWeight
> chick11<- subset(chicks,Diet==1 & Chick==1)
> with(chick11,plot(Time,weight))
To connect each of the sequential points by a straight line
> with(chick11,plot(Time,weight,type="l"))
and to display both the data points and the lines
This last plot is not particularly useful; what we really want is to see how good a linear relationship describes these data. As we'll see later in the course a standard approach is to fit a linear regression model, using the lm() function. Here I illustrate how easy it can be to interact a statistical analysis with a graphing procedure. We will start by re-plotting the data points
> with(chick11,plot(Time,weight))
We then create an object that is the result of fitting a linear regression model to the data subset, in which weight is the dependent (y) variable predicted as having a linear relationship with the predictor (x) variable, Time.
> z <- lm(weight ~ Time, data = chick11)
The regression coefficients (intercept and slope) estimated by lm() are contained in z and used with the abline() function to draw a straight line, which is superimposed on the plot
> abline(coef = coef(z))
The resulting plot is
Customizing your graphics
Sometimes the way the plots or other graphics are displayed is satisfactory, but in many other cases we need or will want to modify axis labels, add legends or titles, or change colours on the graphs that are displayed. In the above the defaults for plot provide x and y axis labels that are the names of the variables in the data frame; plotted lines and points are black, and there is no title. By changing the defaults in the plot() and abline() functions we can 1) add a title, 2) modify the axis labels, and 3) change the line colour to red (for example).
> ## add a legend and colours
> with(chick11,plot(Time,weight,main="Chick 1 Diet1",xlab="Time in Days",ylab="Mass in g"))
> z <- lm(weight ~ Time, data = chick11)
> abline(coef = coef(z),col="red")
In a somewhat more sophisticated example, we can pull values for the title from the data and (using the paste() function) insert these into the title character string. Applying this to Chicks 1 and 2 on Diet 1 we have
> #create legend from data
> with(chick11,plot(Time,weight,main=paste("Chick ",Chick[1],"Diet",Diet[1]),xlab="Time in Days",ylab="Mass in g"))
> z <- lm(weight ~ Time, data = chick11)
> abline(coef = coef(z),col="red")
>
and
> #a different subset
>
> chick12<- subset(chicks,Diet==1 & Chick==2)
> with(chick12,plot(Time,weight,main=paste("Chick ",Chick[1],"Diet",Diet[1]),xlab="Time in Days",ylab="Mass in g"))
> z <- lm(weight ~ Time, data = chick12)
> abline(coef = coef(z),col="red")
As you get more skilled in programming in R, you'll see that code like could fit naturally into a program that could "loop" over the chick and diet combinations and automatically produce labeled graphs for each combination, using only a few lines of code.
Next: Saving results with R