Writing dataframes to external files
Once you have data in a dataframe (discussed in more detail in 2), you can perform various transformations, create new variables, select subsets of the data, and do other manipulations (we will see lots of examples of these actions). However, when you end your R session these changes will be gone, unless you save them in some fashion (the original data will, of course, still be in the csv file you read them from unless you deleted this file). A very simply device is to write the data results to a csv file using write.csv (essentially the reverse of reading data into a data frame with read.csv).
Let's take the example of the chick weight data from earlier. Suppose we created the chick11 dataframe as before. As long as the R session is still open and we haven't removed these results (rm command) we still have access to them as an R object in the data frame obs. We can write the data to csv by the command
> write.csv(chick11,file="chick11.csv",row.names=F)
(the row.names=F option is to avoid an additional column of row numbers, which is usually unnecessary). The file name "chick11.csv" (or other name we assign) will be written to our working directory. The write.csv file can be opened either as a spreadsheet or as a text file; open as a text file in the Notepad editor. You should see something like this
In each row, the entries are separated by commas, and character variables are (by default, which can be changed) enclosed in quotes. Also by default the first row is a header, which contains the names of the columns (variable names).
We can confirm the contents of this file by reading back into R as a dataframe using read.csv() , which is essentially the reverse of write.csv:
> read.csv(file="chick11.csv")
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
11 199 20 1 1
12 205 21 1 1
Note that by default the write.csv commoan will overwrite any existing contents in the output file. If instead you want to append data to existing contents, include the option append=T ).
Save R objects to files
Objects that exist in your R session can generally be saved to external files and accessed later. This can be very handy, since it provides a convenient way to "shelve" your work and retrieve it later. Examples of objects that you might want to save include:
The save() function is used to save R objects to external files, and the load() function is used to retrieve saved objects. As with write.csv() and read.csv(), R will by default write to and read from your specified working directory, so be sure this is set correctly.
A couple key points on saved R objects:
As a first example to illustrate save() and load(), take the subsetted chick weight data considered earlier.
> data(ChickWeight)
> chicks<-ChickWeight
> chick11<- subset(chicks,Diet==1 & Chick==1)
We can save the dataframe as an R object
> #save as an R object
> #save a dataframe
> save(chick11,file="chick11.Robject")
Then if we delete the original object we can get it back
> #delete the original object
> rm(chick11)0
> chick11
Error: object 'chick11' not found
Yup, it's gone!
> #load back in from file
> load("chick11.Robject")
Now it's back
> chick11
Grouped Data: weight ~ Time | Chick
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
11 199 20 1 1
12 205 21 1 1
>
As a somewhat more complex example, let's run the earlier regression of weight versus time for the subset, and save this as an object.Ă
> #save an analysis object
> z <- lm(weight ~ Time, data = chick11)
> save(z,file="regression.Robject")
Nuke the object-- gone!
> rm(z)
> z
Error: object 'z' not found
> load("regression.Robject")
> z
Object is back
Call:
lm(formula = weight ~ Time, data = chick11)
Coefficients:
(Intercept) Time
24.465 7.988
> names(z)
[1] "coefficients" "residuals" "effects" "rank" "fitted.values"
[6] "assign" "qr" "df.residual" "xlevels" "call"
[11] "terms" "model"
>
So-- the object is back, along with all of its elements.
Saving plots and other graphics
If you are working in R Studio, there are a couple of choices for saving plots to graphic (jpeg, bmp, tiff, etc.) or print files (e.g., pdf). I'll illustrate with the last plotting example for chick weights.
If we are content to save graphs one at a time (we only have a few) the easiest approach is to export from the Plot window. Re-running the plotting commands
> #method 1 -- from graphics console
> with(chick11,plot(Time,weight,main=paste("Chick ",Chick[1],"Diet",Diet[1]),xlab="Time in Days",ylab="Mass in g"))
> z <- lm(weight ~ Time, data = chick11)
> abline(coef = coef(z),col="red")
Then go to the Plots window (lower right) in R studio
then select the image format and enter a filename (one is provided by default)
This approach, while fine for a few files, get clunky if you have very many. You have a couple of alternatives. One is to revert to the regular R console, in which the command savePlot can be invoked, which in turn can accept input from within a programming loop. I found a nicer approach online that works within R studio, which turns off the R Studio graphing device temporarily to allow writing to files. I illustrate it here with the chick weight example, creating files in 2 formats (jpeg and pdf).
#method 2 -- direct command to save (works in loops)
jpeg(file="Chickplot.jpg")
with(chick11,plot(Time,weight,main=paste("Chick ",Chick[1],"Diet",Diet[1]),xlab="Time in Days",ylab="Mass in g"))
z <- lm(weight ~ Time, data = chick11)
abline(coef = coef(z),col="red")
dev.off()
#as pdf
pdf(file="Chickplot.pdf")
with(chick11,plot(Time,weight,main=paste("Chick ",Chick[1],"Diet",Diet[1]),xlab="Time in Days",ylab="Mass in g"))
z <- lm(weight ~ Time, data = chick11)
abline(coef = coef(z),col="red")
dev.off()
See this nice blog for more details on this approach, including saving files in other formats and inserting these commands in a loop.
Saving to relational databases
Often, data will reside in Access, DBase, or one of the other common relational database management systems (RDMS) . These have a number of advantages for large, complex problems, including better memory handling and faster queries (searches) than programs such as R or SAS. Many RDMS also have convenient front ends that allow for ease of data entry (e.g., drop down menus and auto-fill features that force entries to be in acceptable ranges). If we have time and students are interested we may discuss specific RDMS in more detail later in the course.
Next: Assignment