Text files with useful R code are attached at the bottom of this page.
Files with R code for specific purposes are also attached to method-specific pages on this site. The site search utility should find these when using an appropriate search term. (such as ".R" or ".R.txt")
icons
http://fontawesome.io/icons/
Can be used with Shiny
Cleaning data
# This might be used to remove any cases where values are outside of a valid range (here, less than zero)
dat2 = subset(dat.in, (x1 > 0) & (x2 > 0) & (x3 > 0) & (x4 > 0))
# This can be used to replace any non-valid (here, less than zero) values with an "NA" indicator
dat3 = dat.in[,1:4]
m <- as.matrix(dat.in[,5:13])
m[m<0] <- NA_real_
dat3[,5:13] <- as.data.frame(m)
# This function can be used to remove outliers based on IQR, reassigning the value to "NA"
# CAUTION: Be careful about omitting values that are "outliers", as the outlier might be a legitimate part of the population
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(0.25, 0.75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[3] + H)] <- NA
y
}
dat$x1.FILTER = remove_outliers(dat$x1)
Use package dplyr and the function glimpse(df). Also available from package tibble.
NOTE - the below approach preceded some recent packages (mentioned above) that makes this much easier.
for (i in 1:dim(dat)[2]){print(typeof(dat[,i]))}
# convert columns in dataframe from 'integer' to 'numeric' (double)
# coerce SOME integer columns to numeric
for (i in 4:dim(dat)[2]){dat[,i] <- as.numeric(dat[,i])}
# started with column 4, but tailor to each situation
Convert fields ISO_Year and ISO_Week to 'date' of format YYYY-mm-dd
d.in$ISOYearWeek = ISOweek2date(paste0(d.in$ISO_Year,"-W",str_pad(d.in$ISO_Week,2,"left","0"),"-1"))
Find a number from a string
# from http://stackoverflow.com/questions/17009628/extracting-numbers-from-string-in-r
# from http://stackoverflow.com/questions/18503177/r-apply-function-on-specific-dataframe-columns
# df[cols] <- lapply(df[cols], FUN)
# ALL = 1-12-12 (use 11212)
findnum.fn = function(findnum.x){
# pass 'd.ocrp.in$BU.and.Facility' to 'findnum.x'
temp2 <- gregexpr("[0-9]+", findnum.x)
if (grepl('ALL',findnum.x)){
findnum.out = 11212
} else if (unlist(temp2) > 0){
findnum.out = as.numeric(unique(unlist(regmatches(findnum.x, temp2))))
} else { findnum.out = 99999 }
} # end function
Individuals and Moving Range Chart
library(qcc)
# Individuals (X) chart
qcc.out = qcc(dat$var, type = 'xbar.one')
# Moving Range (MR) chart
dat.MR = cbind(dat$var[-length(dat$var)], dat$var[-1]) ## Form subgroups of two consecutive values
MRchart = qcc(dat.MR, type = "R")
Monitoring the progress of loops (single or nested)
cat(paste("k = ", k, " j = ", j, "\n")); flush.console()
Reading data (with header) from the clipboard
copied.data <- read.table("clipboard", header = TRUE)
NOTE: Can also use this if the data is included in (for instance, at the bottom of) the R script file and then copy this data section to the clipboard.
Reading using Windows (or OS) file explorer
dat = read.csv(file.choose())
Reading data from Excel
# www.mirai-solutions.com
library(XLConnect)
wb = loadWorkbook("myworkbook.xlsx")
my.data = readWorksheet(wb, sheet = 1)
Reading data from within R using 'textConnection'
d1 <- read.table(textConnection("x y
1 a X
2 b X
3 c X
4 f Z
5 e Z
6 g Z"), header = TRUE, row.names = 1)
this.variable <- rep(NA, length(that.variable))
or
B = vector(mode = "numeric", length = 0)
B = append(B, 5, after = length(B))
Initializing a matrix
append vs. paste
From nabble.com (http://is.gd/BAI1LV)
Initializing like this is generally recommended only when the object sizes get very big. Otherwise it's ok to grow your object as you go (i.e., append to it on each iteration).
Use package dplyr and the function glimpse(df). Also available from package tibble.
The function below preceded the 'glimpse' function available in dplyr and tibble packages.
fn.is.date1 <- function(x){ !all(is.na(as.Date(as.character(x),format="%m/%d/%Y"))) }
fn.is.date2 <- function(x){ !all(is.na(as.Date(as.character(x),format="%Y-%m-%d"))) }
var.info.fn <- function(df.in){
var.info = data.frame()
for (i in 1:dim(df.in)[2]){ var.info[i,1] <- names(df.in)[i]
var.info[i,2] <- typeof(df.in[,i])
var.info[i,3] <- is.numeric(df.in[,i])
var.info[i,4] <- is.factor(df.in[,i])
var.info[i,5] <- fn.is.date1(df.in[,i])
var.info[i,6] <- fn.is.date2(df.in[,i])
} # can add columns for other data types
colnames(var.info) = c("COLNAME","TYPEOF","IS.NUM","IS.FAC","IS.DATE1","IS.DATE2")
return(var.info)
print(var.info)
}
var.info.fn(dat)
Taking a subset based on multiple criteria
dat.all = datin
dat.jam = subset(datin, (datin$event == "jam") | (datin$event == "none"))
Taking a subset of rows of a dataframe
randomSample = function(df,n) {
return (df[sample(nrow(df), n),])
}
dat.sub <- randomSample(dat, 2000)
Function to create plot of both acf and pacf
library(forecast)
plot.acf = function(x){
par(mfrow=c(2,1))
maxlag = trunc(length(x)/3)
Acf(x, lag.max = maxlag)
Pacf(x, lag.max = maxlag)
par(mfrow=c(1,1))
}
ARIMA with fixed lags
> The standard call to ARIMA in the base package such as > > arima(y,c(5,0,0),include.mean=FALSE) > > gives a full 5th order lag polynomial model with for example coeffs > > Coefficients: > ar1 ar2 ar3 ar4 ar5 > 0.4715 0.067 -0.1772 0.0256 -0.2550 > s.e. 0.1421 0.158 0.1569 0.1602 0.1469 > > > Is it possible (I doubt it but am just checking) to define a more > parsimonious lag1 and lag 5 model with coeff ar1 and ar5? > Or do I need one of the other TS packages? You can specify the "fixed" argument (with transform.pars=FALSE): > arima(x, c(5, 0, 0), fixed=c(NA, 0, 0, 0, NA), include.mean=FALSE, transform.pars=FALSE) Call: arima(x = x, order = c(5, 0, 0), include.mean = FALSE, transform.pars = FALSE, fixed = c(NA, 0, 0, 0, NA))
If, when working from a corporate network, there is difficulty accessing CRAN and packages, here is one possible workaround. The actual gateway info will vary with the network.
THIS SEEMS TO WORK
http_proxy="http://gateway.us.zscaler.net:80/"
MIGHT ALSO BE NEEDED, BUT MAYBE NOT
utils::setInternet2(TRUE)
http://stackoverflow.com/questions/5244014/three-graphics-on-a-single-window-in-r
Color transparency in plot
# http://grokbase.com/t/r/r-help/13449khq4j/r-making-scatter-plot-points-fill-semi-transparent
pairs(dat.numeric, col=adjustcolor("blue", alpha=0.25))
color2hex.fn = function(colorname){
rgb.val = t(col2rgb(colorname))
rgb(rgb.val[1],rgb.val[2],rgb.val[3], maxColorValue = 255)
}
color2hex.fn('blue')
shiny download to client-side
http://shiny.rstudio.com/gallery/download-file.html
* qualityTools (1.0)
Thomas Roth
http://crantastic.org/packages/qualityTools
This is a package for teaching statistical methods in the field of
Quality Science. It covers distribution fitting, process capability,
Gage R&R, factorial, fractional factorial design as well as response
surface methods including the use of desirability functions.
* twiddler (0.2-0)
Oliver Flasch
http://crantastic.org/packages/twiddler
Twiddler is an interactive tool that automatically creates a Tcl/Tk
GUI for manipulating variables in any R expression. See the
documentation of the function twiddle to get started.
From the Revolution Analytics blog January 11, 2011
Hans Rosling popularized Motion Charts -- 2-d scatterplots that animate over time -- with the GapMinder project. Motion Charts were taken to their augmented-reality extreme in this clip from the BBC programme, The Joy of Stats, but now you can create similar (if less audacious) motion charts for yourself with just R and a Flash-enabled browser.
First, you'll need to install the googleVis package from CRAN, and then use the gvisMotionChart to create an object in R. Then, you'll need to export part of it to a text file, and embed the HTML into a web page. The googleVis documentation explains the details but a good place to start is tutorial. I followed it and with these four lines of R code:
install.packages("googleVis")
library(googleVis)
M <- gvisMotionChart(Fruits, "Fruit", "Year")
cat(M$html$chart, file="tmp.html")
I was able to create a page just like this. If course, you can take this further and create animated visualizations of real data just as the Spatial Analysis blog has done, to create motion plots very similar to those from the GapMinder application. Check them out at the link below.
Spatial Analysis: R interface to Google Chart Tools
testin <- read.fwf(url("ftp://ftp.bls.gov/pub/time.series/sm/sm.data.1.Alabama", open="r"), c(2,1,2,5,2,8,2,100 ), header=F, n=100, skip=1)
toread <- "id sex age inc r1 r2 r3
1 F 35 17 7 2 2
17 M 50 14 5 5 3
33 F 45 6 7 2 7
49 M 24 14 7 5 7
65 F 52 9 4 7 7
81 M 44 11 7 7 7
2 F 34 17 6 5 3
18 M 40 14 7 5 2
34 F 47 6 6 5 6
50 M 35 17 5 7 5"
survey <- read.table(textConnection(toread), header = TRUE)
closeAllConnections()
survey
See ?read.table and ?textConnection for more information.
See attached file "jaws.jnlp", which is a Java download for R-Cloud Workbench using Remote access to R/Bioconductor on EBI's 64-bit Linux Cluster.
Source URL: http://www.ebi.ac.uk/Tools/rcloud/
http://www.yeroon.net/ggplot2/
For Logistic Regression, there is often a need to convert the "linear predictor" into a probability. This would be accomplished by using the reverse (or inverse?) logit transform.
This can be coded by hand, or we can rely on existing functionality.
General R note: "plogis" has consequently been called the ‘inverse logit’. Check out ?plogis