[r] resources, tips and what-not - for Curtis

Code to Remember - should probably have these memorized by now.

Image Linetypes and ShapesLinetypes and Shape Types in R

Example of a for loop
just a reminder of the basic syntax
`for (i in VECTOR)`` {`
`    WHATNOT HAPPENS with "i"`
`}`

Example of while loop
`while (i != TRUE)`` {`
`    WHATNOT HAPPENS with "i"`        #keeps doing this until i == TRUE,
#i.e. if i == TRUE, the loop stops
`}`

Example of if statement

`  if ( -SOME STATEMENT THAT IS TRUE OR FASLE ){  `
`      IF TRUE, DO THIS STUFF`
`  }`
`  `
`  else { `
`      IF THAT STATEMENT WAS FALSE, THEN DO THIS`
`  }`

Example of if and if else statement
I like organizing my multiple if statements like this:

`A<-2`` `
`if(A==1){B<-1} else`` `
`if(A==2){B<-2} else`
`if(A==3){B<-3}`

`    this means that B is assigned to 2`

Multiple Conditional Statements

`    AND:`
`if (x>2 & x <5) { do this }`

`    `
`    ``OR:`
`if (x>2 | x <5) { do this }`

Where "x axis" is a vector (or multiple vectors) to be plotted on the x axis, from minimum to max.
y-axis places those points vertically. ylim sets the range of the y-axis (useful, but not totally necessary). Main sets the main title.

`plot(x axis,y axis, type="n", main=paste(X1,"- Continuous Time - Color=Player",sep=" "), ylim=c(0,1),`
`         ylab="action choice", xlab="time"`
`    )`

Add data on top of a PLOT
lines() -- adds lines on top of existing plot.

Hist - hist - Histogram basics

Example: rolling two d4 dies
The following are possible: c(2,3,4,5,3,4,5,6,4,5,6,7,5,6,7,8)
`x = c(2,3,4,5,3,4,5,6,4,5,6,7,5,6,7,8)hist(x``)`
`this produces a nice histogram`

Adding Breaks to Histogram: hist(x,bins), where "bins" "suggests a number of bins. But doesn't actually set it. You'll want to designate breaks. These set start and end points for

`breaks = c(0,1,2,3,4,5,6,7,8,9)`
`hist(x,breaks)`

produces the hist you wanted!

Density Hist (instead of the frequency hist above)
`hist(BMI, freq=FALSE, main="Density plot")`

ggplot2 Hist

`HistGram <- qplot(VARIABLE, data=mydata, geom="histogram",``binwidth=1````) ```
``` ```

ggplot2 relative frequency histogram (harder than it should be!!!)
`ggplot(dataframe, aes(x=x)) + `
`  geom_histogram(aes(y=..count../sum(..count..)),`
`                 binwidth=0.05,`
`                 colour="#808080") +`
`  geom_vline(xintercept = c(0.25,0.75),`
`             colour="#808080",`
`             linetype="longdash") +`
`  xlim(0,1) +`
`  #facet_wrap(~ treatment) + #adds mult hists`
`  theme(strip.text.x = element_text(size = 12)) + #nice look`
`  labs(title = "", x="X-Axis Label", y="relative frequency")`

Adding a LINE to a plot or hist - abline
`abline(h = 0.05)`
`#adds a horizontal line at 0.05`

`abline(v = 1.5)`
`#adds a vertical line at 1.5`

Adding a subtitle to a PLOT or histogram
this adds the following just below the main title, and just able the plot/histogram.
I've made it to give statisitics info too, average payoffs of players.
`  mtext((paste`
`         ("Red Payoffs: ",round(payoffs,digits=2),`
`          " - Blue Payoffs: ",round(payoffs,digits=2),`
`          " - Black Payoffs: ",round(payoffs,digits=2),`
`          " - Orange Payoffs: ",round(payoffs,digits=2))  `
`          )`
`        )`

More complicated text for plot title and elsewhere
It let's you put in basic notation, and reports numeric values in the chart itself.

Special Plot/Hist/Graph Layouts
`layout()` - a little hard to explain - create a matrix, (3x3 below) and indicate where each graph/hist/plot is to go.

`layout(matrix(c(1,1,1,2,3,4,5,5,5), 3, 3, byrow = TRUE))`

layout(matrix(c(layout), nrow, ncols, byrow = TRUE))

data.frame()
Create a matrix with a header, column label "names".
`data<-data.frame(cbind(secondsLeft=0,subject=1:4),strategy0=sample(100, 4, replace=TRUE),payoff=0)`
`  secondsLeft subject strategy0 payoff`
`1           0       1        75      0`
`2           0       2        96      0`
`3           0       3        75      0`
`4           0       4        58      0`

Matrix Operations Stuff
Create:
`a <- matrix(1,2,4)#creates row:2 by column:4 matrix, values 1`

Call a Row
`a[1,]#this called row 1`

Call a Column
`a[,3]#this called column 3`

Call multiple columns
`a[,2:4]this called columns 2, 3 & 4 - so it returns a matrix of 3 columns`

Convert String Name Into Callable Variable
ie a variable or data frame name in string form ("a"), call the variable a.
Example:
`a=c(1,2,3,4)`

`get("a)`
` 1 2 3 4`
~surprisingly useful!

Sorting matrix by multiple (or one) columns

`SET-UP`
`DF <- data.frame(col1 = c(1,2,3,4,5,6,7,8), `
`                 col2 = c("a","a","b","b","z","z","m","n"), `
`                 col3 = c(2,3,1,8,5,3,2,9))`

Here, 1st sort DF by column 3 descending, then sort by column 2 alphabetically
` `DF[with(DF,order(col2, -col3)),]``

Here, sort DF by column 3 ascending.
``DF[with(DF,order(col3)),]``

What does "lty" mean?
lty      "line type"
seven pre-set styles, specified by either their integer or name  (default = “solid”)
 0 blank 1 solid 2 dashed 3 dotted 4 dotdash 5 longdash 6 two dash

Export - write to csv file
Step 1. set your working directory (the .csv file will be sent there).
Step 2. make sure the data you want to export is structured in a way the csv file can deal with (i.e., an nXm matrix, perhaps with a names row.

`write.table(data,file="NameMe.csv",sep=",",row.names=FALSE)`

`# "data" is the matrix, data.frame you want export. You can do an subset or a data[2:4] subset. `
`# file="BlahBlah.csv" is the name of the output file`
`# sep="," just knows to save it as a comma seperated .csv file (other options: tab, space etc)`
`# ``row.names=FALSE removes the row number index column from the output file. you want this. `

Post Script
Error: "`cannot open file 'BlahBlah.csv': Permission denied`" probably means you have that output file opened.

Subset with Multiple "IN" statements
`#Where L1, L2 and L3 could be numbers, or characters (depending on the `
`# vector type of ColName)`
`subset(data, subset = ColName %in% c(L1,L2,L3))`

Step 1. Go to google docs file
>> File > Publish to web
>>  Sheets to publish > 1 Sheet
>  Get a link to the published data > CSV (comma.....)
>> copy and paste the full URL
e.g. "`https://docs.google.com/spreadsheet/pub?key=0AqJiu95E6EGfdEVSeE80Y1dPUmZDbkRMZ1JRbzlGd2c&output=csv`"
Let's call that URL just `URL`.
Step 2. Make sure you have the RCurl package installed. `install.packages("RCurl")`

`library(RCurl)`

`options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))`

`Myurl <- getURL("``URL_see_step_1_above``")`
`MyData <- read.csv(textConnection(Myurl))`

And now `MyData `if in R.

Post Scripts
(1)  options(...), this may change as google docs changes its standard. this worked as of March 2013.
(2)  Keep in mind that as you update your google doc, that URL isn't updated. That URL you get in step one is the version of the google retrieved when you hit "publish data" (at least that's been the case in my experience).
Thus you may need to re-publish your google doc each time you want to import into R an updated version of your google doc.

Return Multi-Line Paste Text
Supply you have a function, and you want to return/print/paste something over multiple lines:
cat()

`  cat(    `
`    paste(`
`      paste("ID is:",Variable_1, sep=" "),`
`      paste("Name is ",Variable_2``, sep=" "),`
`      paste("Some other next line:",Variable_3 ,sep=" ")``,`
`    ``    ``    ``sep=" \n"`
`    )`
`  )`

Convert a Table into a nice clean matrix
As in, the data the HIST() fucntion uses to make a histogram.
If you check out the table() function, it's its own data type (not matrix).

`as.data.frame(mytable)`

plyr package
a package for all sorts of sorting subsetting and stuffs

install
`install.packages("plyr")`
`library(plyr)`

plyr arrnage() - sort, and multisort
for my purposes, a function to sort data via column.
`arrange(data,...)`
`data` is the data.frame you want to work with.
`...` is the rest. Whatever is first (seperated by commas) will be sorted last.
default sort is sort assending (A=>Z or -100=>0=>100)
to select decending sort, `desc(ColName)`

Example
`dd <- data.frame(b = factor(c("Aa", "Ba", "Ba", "Zz"), `                 levels = c("Aa", "Ba", "Zz"), ordered = TRUE),
x = c("A", "D", "A", "C"),
y = c(8, 3, 9, 9), z = c(1, 1, 1, 2))

Example of plyr arrange: sort dd, have column "z" sorted desending, with column b sorted by level.
`arrange(dd,desc(z),b)   b x y z1 Zz C 9 22 Aa A 8 13 Ba D 3 14 Ba A 9 1`

Example of plyr arrange: have column "y" sorted from least to most, with b sorted from least level to highest.
`> arrange(dd,y,b)`
`   b x y z`
`1 Ba D 3 1`
`2 Aa A 8 1`
`3 Ba A 9 1`
`4 Zz C 9 2`
`> arrange(dd,y,desc(b))`
`   b x y z`
`1 Ba D 3 1`
`2 Aa A 8 1`
`3 Zz C 9 2`
`4 Ba A 9 1`

Clock Your Code - time how long your code takes to run.  (notes)
The first line starts the clock, and the second stops it, and returns a report. This is basically a stopwatch, how much time it took to go from the first line to the last.
`ptm <- proc.time()`
`{ insert code you want to clock }`
`proc.time() - ptm`
`   user  system elapsed `
`   1.05    0.02    1.06 `
• user time relates to the execution of the code
• system time relates to your CPU
• elapsed time is the difference in times since you started the stopwatch

Returns the time to run a function
`system.time(function(args)``)`
`e.g.`
`system.time(sum(mtcars\$mpg)``)`

Take a Random Sample of Rows from a Data.Frame (matrix)
If you want to take a same part of a larger dataset (say, to test code), and you don't want to use head(data,nrows), then use:

`mydata #this is a dataframe with many rows and perhaps many columns`
`temp <- mydata[sample(nrow(mydata),3),]  #this creates a new data.frame of 3 rows, randomly sampled from mydata.`

Drop Columns from Data.Frame, by Name (notes)

``````data <- data.frame(
x=1:10,
y=10:1,
z=rep(5,10),
a=11:20
)
drops <- c("x","z")
data <- data[,!(names(data) %in% drops)]``````

Alternatively, Dropping a column from a data frame by column name
Given the data frame "DF" and the column "ColumnX"

`DF <- subset(DF, select = -c(ColumnX))`

Selecting Only A Set of Columns from a Data Frame, by Name
Given the data frame "DF" and you want to select the columns "Column1" and "ColumnY"

`DF <- subset(DF, select = c("Column1", "ColumnY"))`

Select Zero Rows From a Data Frame (that is, create a empty version of another dataframe)
``NuDF <- DF[0,]``

Rename Column Names, by Name
There's a plyr function, rename(), that I've always had issue with, so here' my fix:

`data.frame called "mydata"    with a column "Column1"    you want to rename "Column1_old"`
`names(mydata)[which(names(mydata)=="Column1")] = "Column1_old"# that's it`

Take a large data.frame, and keep only the first row for each unique observation of a value in one column.
If "temp" is your dataframe (with make rows) and ID is your variable that you want only one observation for:

``temp <- temp``[!``duplicated``(``test\$ID``),]``

Remove Columns from a Data Frame by Name
`MyData `
`names(MyData)`
`  "1" "2" "3" "4"`

`MyData <- MyData``[ , -which(names(MyData``) %in% c("1","3"))]`

`names(MyData)`
` "2" "4"`

Get a list of column names in a nice organized way:
`paste(names(DF), collapse=", ") "Col1, Col2, Col3, ..., Coln"`

Change the Order of a Factor
`MyData\$FactorVector <- `
`                factor(MyData\$FactorVector``,`
`                levels = c("06", "05", "04", ``"03", "02", "01"))`

#ALSO
MyData\$time <- factor(MyData\$time, rev(levels(MyData\$time)))

Define Colors in a ggplot2 graph
` ggplot(MyData, aes(x=time, y=value, group=geo,colour=factor(geo))) + `
`   geom_line(size = 2) +`
`   geom_vline(x=c(18,30)) +`
`   scale_colour_manual(values=c("brown", "blue", "dark blue","light blue","grey"))`

Rename  Factor names:
http://www.cookbook-r.com/Manipulating_data/Renaming_levels_of_a_factor/
`mapvalues(x, from = c("beta", "gamma"), to = c("two", "three"))`
`MyData\$geo <-mapvalues(MyData\$geo, `
`          from = c("DE", "FR", "IT", "SE", "UK"), `
`          to = c("Germany", "France","Italy","Sweeden","England"))`

Tilt Axis Labels with ggplot2
``opts(axis.text.x=theme_text(angle=-90))``

Given a String/Character for an Object, Evaluate the object named by the text.
`````` eval(parse(text="5+5"))
 10
class("5+5")
 "character"
class(parse(text="5+5"))
 "expression"``````

Plot - add latex expression into plot title

Complicated stuff:

Simple stuff
``````X <- 1:10
Y <- 1:10
a <- 0.8
plot(X,Y,main=bquote(R^2 : .(a)))``````

Given that you saved an object (data frame) to an .`RData `like '`saved.file.rda`'
You could: `load(``'saved.file.rda'``)`

`bar <- get(load('saved.file.rda'))`

`total <- 20`
`# create progress bar`
`pb <- txtProgressBar(min = 0, max = total, style = 3)`
`for(i in 1:total){`
`   Sys.sleep(0.1)`
`   # update progress bar`
`   setTxtProgressBar(pb, i)`
`}`
`close(pb)`

Install an R Package/Library from a specific cran repository

`install.packages("PackageNames", repos = "http://cran.cnr.Berkeley.edu/")`

When using dplyr::select to rearrange a data frame's columns, how do I attached all other columns?

`dplyr::select(DF, var1, var2, -var3, matches(".")`

This will take the data frame DF, make var1 the first column, var2 the section, remove var3, and then attached to the rest of the data frame all other columns/variables that existed.

Grouping collection of asynchronous events, by dates
That is, suppose you have a big flow of events all comeing in at different dates. Suppose you want to put these dates into sets of, say, week1, week2, or month1, month2 etc.
Here's one way from the valve dataset.

`DFPlot\$nuDate`` ``= cut(x = as.Date(DFDate), breaks = as.Date(7 * (-94:2), origin = "2013-05-21" ))`
`DFPlot <- DFPlot %>%`
`  dplyr::select(`
`    Item, Date, nuDate, matches(".")`
`  ) %>%`
`  dplyr::mutate(`
`    Date = as.Date(nuDate)`
`  )`

You'll obviously need to set the `breaks` to the appropriate date ranges.

Simple Stats Functions - you should probably memorize

SUMMARY & SIMPLE STATS
mean() -- gives the arithmetic mean
sum() -- gives the total
sd() -- give the standard deviation
min() -- minimum
max() -- maximum
length() -- count.
srt() - gives sense of str of dataframe (length, number of rows and columns, and a few examples)

Sequences
5:9 => 5,6,7,8,9
seq() - e.g. se1(3) => 1 2 3
seq(3,5) => 3,4,5
seq(3,7,0.5) => 3,3.5,4,4.5,5
9:5 => 9,8,7,6,5

Manipulating Vectors

c() - stands for 'combine' apparently

`x = c('I','want','to')`
`x <- 'party'`

`x`
` "I" "want" "to" "party"`

x[c(2,3)]
 "want" "to"
x[2:4]
 "want" "to" "party"

cbind() - attach a column to a data.frame
rbind() - attach a row to a data.frame
union() - append a value to a vector. So, similar to cbind or rbind, but just adding vectors together (or adding values to vectors). Also, union combines only UNIQUE values. (so union(c(1,2),c(2,3))=c(1,2,3)

Merging Datasets
Merging tips - rbind cbind, merge
Comparing datasets - dupsBetweenGroups, splitting, checking for dublicates,

Merge two datasets by two (or more) column names
```data.new <- merge(merge_data_x,                   merge_data_y,                   by.x = c("ID", "ID2"),                   by.y = c("ID", "ID_2")) ```

Also note about dropping "missing" data - the details setting "drops all unmatched cases". all=TRUE
`data.new <- merge(x,``y,``by.x = "ID",by``.y = "ID" , `all=TRUE`))`` `

If you are okay with dropping unmatches instances in one dataset, but note the other. Suppose you want to keep all rows in x, but you can drop y.
`data.new <- merge(x,``y,``by.x = "ID",by``.y = "ID" , ``all.x=TRUE``))`` `

length()
Count the number of values/objects in a vector

`hey=c(1,4,5,3,2,2,2)`
`length(hey)`

`     7`

`length(unique(hey))`

`    `` 5`

#subsets a ticks file (experimental data), for the name of the block you're interested in, then gives the numbers of the subjects in that subset:
`unique((subset(ticks20120214, name=="4p-600-1-nPL-d"))\$subject)`

`     4 3 2 1`

paste() -- print combine text and data
lets you combine variables, numbers and strings/text into a single text. Useful for plots, tables, histograms, etc. Don't forget the 'sep=..." field.
X1="POOP"
paste(X1,"- Continuous Time - Color=Player",sep=" ")

` "POOP - Continuous Time - Color=Player"`

Round
round() -- the value, and the number of digits past the decimal point.
`round(0.123456789,digits=4)`
` 0.1235`

sample() - random plus more
Random integer between zero and 100
`sample(1:100,1,replace=T)`
`      47 #SOME RANDOM INT`

`sample(100, 4, replace=TRUE)`
`     67  4  1 45`

`L26=LETTERS[1:26]`
`sample(L26, 4, replace=TRUE)`

`    `` "R" "K" "M" "R"`

Fun With Variable Names, Strings, and Values
Hard to explain, but say you have dataframes called ticks2012a, and ticks2012b. Say you want to cycle through them. (Note the values in variable 'x' below are all strings!)
You wouldn't be able to loop through ticks2012a, because it would start going through the values inside that dataframe.
`ticks2012a<-c(1,2,3,4)`
`ticks2012b<-c(5,6,7,8)`
`x<-c("ticks2012a","ticks2012b")`
`for (i in c(1:length(x))) {`
`    print(get(x[i]))`
`}`
` 1 2 3 4`
` 5 6 7 8`

See:
`x<-c(ticks2012a,ticks2012b)`
`for (i in c(1:length(x))) {`
`    print(x)`
`}`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`
` 1 2 3 4 5 6 7 8`

Now going the other way,
Converting a Value Name (or data name, or object name etc.) into a string:
`data <- c(1,2,3,4)`
` deparse(substitute(data))  "data"`

Combinations [pdf] - given combinations of a vector or a sequence of integers.
(under utils, perhaps not in base)
`library(``combinat``)`
`permn(seq(3))`
`   #gives all 6 possible combinations`

Testing for and Removing NAs or NaNs - many functions don't work if you have any NAs or NaNs in the vector you're curious about. Easy to remove them.

` a=c(1,2,3,4,NA)`
` mean(a)`
` NA    #rough!`

`mean(a,na.rm = TRUE)`
` 2.5     #good!`

Testing for and Removing NULLs

`is.null()`

`a <- c()`
`is.null(a) TRUE`

Using `as.character()`. to find and replace NULLs from a vector
~ this is probably a dirty way to do this, but effective, and fast enough for me.
`ifelse(test, yes, no)`
`ifelse(as.character(data\$column)=="NULL","This was NULL!",data\$column))`

Which and Head - Given a vector of values, and given a number you're interested in, find the next highest and the next lowest value to your value of interest in that vector.

`data<-c(0,2,3,5,4,6,8)`
`x=3.5 #the number of interest`
`y=sort(data)`
`Above <-y[which(y>x)]`
`Below <-y[max(which(y<x))]`

%in% - Check whether or not a value is inside another vector/data-frame.

`x<-c(0,1,2,3,4,5,6,8)`
`3 %in% x     #TRUE`
`7 %in% x     #FALSE`

Not InCheck whether or not a value is not-inside another vector/data-frame.
`x<-c(0,1,2,3,4,5,6,8)`
`!(3 %in% x)    #FALSE`
`!(7 %in% x)    #TRUE`

Given two Vectors, return the values the values in one that are not in the other.
`# Set-up: `
`Bigger <- c(1,2,3,4,5); Smaller <- c(1,4)`

Missing Values - Returns Values In Bigger that are not in Smaller
``````Bigger[!(Bigger %in% Smaller)]
```` 2 3 5````

Count of the number of items in one list that are not in another
`length(unique(``Bigger[!(Bigger %in% Smaller)]``))`

When does a value appear in a vector? - return an index.
match(VALUE,VECTOR)
Returns the index in which the value "VALUE" first appears in the vector "VECTOR"
`a <- 1`
`b <- c(3,4,5,6,4,3,1,222,5)`
`match(a,b)`
`  7`
` `

Fast Find and Replace
Original_Data[Original_Data == "Incorrect"] <- "Correct"

`#suppose you want to replace all 4's with 10s`
`dataset <- c(1,2,3,3,3,4,5,6,7,5,4,3,2,2,3,4)`
`dataset[dataset == 4] <- 10`

`> dataset`
`   1  2  3  3  3 10  5  6  7  5 10  3  2  2  3 10`

And you can set this up to cycle through many find-and-replace terms
`Incorrect <- c("Error1",``"Error2",``"Error3",``"Error4"``)`
`Correct <- c(``"Correct1",``"Correct2",``"Correct3",``"Correct4"``)`
`dataset <- #and the dataset has all the values you want to look into....`
`for (i in 1:length(Incorrect)){`
`  dataset[ == Incorrect[i]] <- correct[i]`
`}`

ggplot2

ggplot2 - a line plot over time, with dots and cool stuff

`library(ggplot2)`
`ggplot(data=MyData, aes(x=yVal, y=yVal, group=Groups, colour=COLOR)) + `
`  geom_point(size=4) +`
`  geom_line(size=1)`

ggplot2 - multiplot

`multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {`
`  require(grid)`
`  `
`  # Make a list from the ... arguments and plotlist`
`  plots <- c(list(...), plotlist)`
`  `
`  numPlots = length(plots)`
`  `
`  # If layout is NULL, then use 'cols' to determine layout`
`  if (is.null(layout)) {`
`    # Make the panel`
`    # ncol: Number of columns of plots`
`    # nrow: Number of rows needed, calculated from # of cols`
`    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),`
`                     ncol = cols, nrow = ceiling(numPlots/cols))`
`  }`
`  `
`  if (numPlots==1) {`
`    print(plots[])`
`    `
`  } else {`
`    # Set up the page`
`    grid.newpage()`
`    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))`
`    `
`    # Make each plot, in the correct location`
`    for (i in 1:numPlots) {`
`      # Get the i,j matrix positions of the regions that contain this subplot`
`      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))`
`      `
`      print(plots[[i]], vp = viewport(layout.pos.row = matchidx\$row,`
`                                      layout.pos.col = matchidx\$col))`
`    }`
`  }`
`}`

`  multiplot(p7,p1,p2,p3,p4,p5,p6,`
`            layout=(matrix(c(1,1,1,1,1,1,2,3,4,5,6,7), `
`            2, 6, byrow = TRUE)))`
`    # top row is one plot`
`    # second row is 6 plots`

ggplot2 - change the order of a plot legend

ggplot2 - reverse the order or a discrete or continuous scale

ggpplot2 background color
`theme(panel.background = element_rect(fill='white', colour="white"))`

R - String for a quotation mark

`paste(" 'Hey!' ")`

plyr id() function

Garbage Collection: Function: cg()
Report on memory usage. Gives you a sense of the system resources at your disposal.

melt() - data, id, na.rm
Converts a multi-column data frame into a three column data frame.

`require(reshape2) `

`nuDF <- `
`melt(data = DF,              value.name = "Value")             )`
- Where DF is obviously the data frame
- Pretty sure the first column is ALWAYS selected to be the main stacking column. Though that technically doesn't affect how useful the data is, it's helpful to keep that in mind.
- Where "value.name" is really up to you. It's whatever you think is appropriate
- This will take every column but the first, and make it a row entry
- with col2 = "variable", the old columns name.

acast() - data, by ids, variable, mean,

dcast()

`  dcast(DF, ID_1 ~ Direction, value.var = "ID_2", list)`
DF is the data frame
ID_1 is the thing you want unique values of.
Direction is the thing that you want converted into columns (say, Direction 0 and 1)
Value.Var is the value that actually gets inserted into the new column.
list is just an functio that gets applied to those value. list just puts in a list. black will just put in values (assuming there is only one value, error otherwise)

This will now show each unique Id1+Id2 combo
dcast(DF, ID_1 + ID_2 ~ Direction, value.var = "ID_3", list)

Update the Version of R on Your Machine (Windows)
1) install the new version
2) Make note of all or the packages/libraries that you installed.
3) Uninstall the old version of R
4) Run install.packages("...") for all the packages you will want.

ddply - a useful plyr function for finding stats by variable settings

subsetting with ddply
`ddply(mydata, .(instrument), summarise, `
`      avgProfit = mean(TCurr[TCurr > 0]))`

this takes the subset of TCurr such that that variable is less than 0

Conditional applying
`system.time({`
`  NuData <- ddply(Prices.MovingAvg, `
`                  .(DefID_AppID_Q), `
`                  summarize,`
`                  NumDaysTraded = length(Turnover[Turnover!=0])`)
`})`

Dates
as.Date

Change the Way Date is Formatted (e.g. going from "/" separated dates to "." or "-" seperated date formats.
``````format(as.Date(0, origin = "2011-08-09"),  "%Y.%m.%d")
# Converts to "2011.08.09"``````

Time
`strptime(paste("07/10/13",10:30), "%m/%d/%y %H:%M")`

 Symbol Meaning Example %d day as a number (0-31) 01-31 %a%A abbreviated weekday unabbreviated weekday MonMonday %m month (00-12) 00-12 %b%B abbreviated monthunabbreviated month JanJanuary %y%Y 2-digit year 4-digit year 072007

Noussair Model - Model with Interaction Term between a categorical variable (a factor variable) and a continuous variable.
`library(reshape2)`
`df <- data.frame(x = 1:300)`
`df\$y1 <-  (0.7/df\$x + 0.1*(df\$x-1)/df\$x + rnorm(300,0,0.015))`
`df\$y2 <-  (0.5/df\$x + 0.1*(df\$x-1)/df\$x + rnorm(300,0,0.015))`
`df\$y3 <-  (0.3/df\$x + 0.1*(df\$x-1)/df\$x + rnorm(300,0,0.015))`
`df <- melt(df, id = 1)`
These three treatments all converge to the same value (0.1), but start at different initial values, depending on membership to the factor variable y1, y2 or y3. The following will estimate these:

`summary(lm(df\$value ~ 0 + df\$variable:(I(1/df\$x)) + I((df\$x-1)/df\$x)))`

Check if a subset of numbers are in a larger set, return true if yes, false if no.

checklist <- c(1,2,3)

Text Functions

Return Index of Character in String
Suppose you want to find the index number of the first instance of a character in a string?
E.g., you have "123456_789", and you want to return the index of the character "_"?

`regexpr("_","263_6")`

`Returns: 4`

Substring Function
substr(String, start, stop) - and the start and stop are inclusive

`substr("123456789", 2,5) "2345"`
`dplyr::mutate(    data,    newVar = substr(Var1, nchar(as.character(Var1)) - 1, `nchar(as.character(Var1))
)

# returns the last two characters from the column of strings in Var1

SubString Surrounded by...
A function that returns the string surrounded by the first and second appearance of a particular character.
`E.g. ``SubStrSur(``"_", ``"123_456_789) returns "456"`

`SubStrSur <- function(Exp,x){`
`   #Given a string, returns the value between ``Exp`
`   NuStr <- substr(x, regexpr(``Exp``,x)+1,nchar(x))`
`   substr(NuStr,1, regexpr(``Exp``,NuStr)-1)`
` }`

Using Sapply and strsplit (string split) to find substrings
`PriceObs.SM_Study\$AppID = `
`           sapply(strsplit(PriceObs.SM_Study\$DefID_AppID_Q, "_"),"[[",2)`

For ggplot2 xlim ylim, if you don't want to drop rows, you need to coord_cartesian
`coord_cartesian(xlim = NULL, ylim = NULL, wise = NULL)`

dplyr tools and tips
Introduction to dplyr - http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

Remove Grey/Gray border around lengend
``+ theme(legend.key = element_blank())``

Smoothing Stuff - for heatmaps, and contour stuff.

#interp converts x,y,z values into something plots can work with (x,y vector, and z matrix)
`s = interp(x,y,z,`
`           ``xo=seq(0,5, length=100),`
`           yo=seq(0,5, length=100), `
`           duplicate = "median")`

#smooth.2d is a function that takes a list of s's structure (which is what plotting functions use for heatmaps) and smooths those z values.
``smooth.2d(z, x=cbind(x,y), theta=0.5)``

#a heatmap maker:
`    `x <- CoopPlayerSummary.temp\$AggUndercutNum
y <- CoopPlayerSummary.temp\$CounterpartAggUndercutNum
z <- loess(Score ~ x + y,data=CoopPlayerSummary.temp,span=0.9)
t. <- interp(x,y,z\$fitted,
xo=seq(0,90, length=100),
yo=seq(0,90, length=100), duplicate = "median")
t.df <- data.frame(t.)
t.df[is.na(t.df)] = 0.

if (length(levels) == 0){
zlim = range(gt\$value, finite = TRUE)
levels = pretty(zlim, NumCols )
`    }`

`    colramp = colorRampPalette(c("#fafcef","#6b0200","#0f0000"))`
`    `
`    plot(NA,`
`         xlim=(c(0,90)),`
`         ylim=(c(0,90)),`
`         xlab = "...", `
`         ylab = "...",`
`         frame=FALSE,axes=F,xaxs="i",yaxs="i")`
`    `
`    .filled.contour(x=t.df\$x, `
`                    y=t.df\$y, `
`                    z=as.matrix(t.df[3:ncol(t.df),1:length(t.df\$x)]),`
`                    levels = levels,`
`                    #nlevels=length(colramp(ncols)), `
`                    col = colramp(length(levels))`
`    )`
`    axis(1, seq(0.0, 90, 5), label=TRUE, tcl=-0.5, col = "white", col.ticks='grey')`
`    axis(2, seq(0.0, 90, 5), label=TRUE, tcl=-0.5, col = "white", col.ticks='grey',las=1)`

Progress Bar
Very simple!
````total <- 20`
`# create progress bar````
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
Sys.sleep(0.1)```
`   # update progress bar````
setTxtProgressBar(pb, i)
}
close(pb)``````

Subpages (2):