R-CRAN
Guide to getting started with R-CRAN software using web-based resources, tutorials and books. If you are looking for software for statistics computing you are welcome to try this guide. If you still have doubts whether you should make an effort and spend your time learning R check Rationale for using R.
R-CRAN biomedical minicourse
Comprehensive R Archive Network reference cards
Tutorials
R Graphical User Interfaces (GUIs)
Courses
Books
R-CRAN biomedical minicourse
Navigation in R
getwd()
setwd("c:/")
use arrows to select previous commands
list.files()
Importing and managing data in the R-CRAN environment
read.csv()
how to prepare csv data?
Datatypes
Summary statistics
Graphical representation of data (part 1)
Graphical representation of data (part 2)
Simple statistical tests
General linear model
Saving and running the code in R-CRAN
R reference cards
Tutorials
Revolution analytics R resources
Kickstarting R by Jim Lemon
Statistics with R by Vincent Zoonekynd
Fitting distributions with R - Ricci
Geographic data processing in R
R tutorial - MAT 356 R Tutorial, Spring 2004
http://linuxlearningsurveyresults.pbworks.com/w/page/34317936/FrontPage
Medical imaging in R
Perfusion analysis package - dcemri
R Commander
R commander an Introduction by Natasha Karp
Rattle
Java Gui for R – cross-platform stand-alone R terminal and editor based on Java (also known as JGR)
Deducer - GUI for menu driven data analysis (similar to SPSS/JMP/Minitab). It has a to do common data manipulation and analysis tasks, and an excel- designed to be used with JGR but also RGUI.
Rattle GUI – cross-platform GUI based on RGtk2 and specifically designed for data mining. Video tutorials
R Commander – cross-platform menu-driven GUI based on tcltk (several plug-ins to Rcmdr are also available)
RExcel – using R and Rcmdr from within Microsoft Excel
Sage – web browser interface as well as rpy support
Sim.DiffProcGUI – Graphical User Interface for Simulation of Diffusion Processes based on tcltk
Cantor (software) – KDE worksheet interface to several mathematical applications, including R
Red-R – visual analysis interface that uses R for statistics
Tinn-R – GUI for R Language and Environment
RKWard – extensible GUI and IDE for R
R AnalyticFlow - analysis flowcharts with R (freeware)
RStudio- cross platform open source IDE able to be run on a remote linux server *recommended*
RapidMiner and the RapidMiner R extension - extensible open source GUI and IDE for R, Weka, and RapidMiner data mining processes and their seamless integration
Recommended R books (since listing all of them is almost impossible)
Introductory Statistics with R - Dalgaard
Data mining with R - Luis Torgo book companion web site
Handbook of Statistical Analyses Using R
Quick-R - R in Action by Quick-R author
R Contributed documentation - you will definitely find something interesting here
Using R for Data Analysis and Graphics - 3rd edition; files; ch1exercise; DM apps
P. Biecek - Przewodnik po pakiecie R
Courses
David Mease video lectures - Statistics 202: Statistical Aspects of Data Mining
The R Journal - used to be R News
Primer of Biostatistics - Glantz
Chapter 2 - How to summarize data
Key concepts
Mean
Measures of variability: variance and standard deviation
The normal distribution
Percentiles
Random sampling
Bias
Experimental and observational studies
Randomized controlled trials
Central limit theorem
Problems
example 2-1
Data derived from
Quinn TC, Wawer MJ, Sewankambo N, Serwadda D, Li C, Wabwire-Mangen F, Meehan MO, Lutalo T, Gray RH. Viral load and heterosexual transmission of humanimmunodeficiency virus type 1. Rakai Project Study Group. N Engl J Med. 2000 Mar 30;342(13):921-9. PubMed PMID: 10738050.
HIV1RNAload <- c(79725,12862,18022,76712256440,14013,46083,6808,
+ 85781,1251,6081,50397,11020,13633,1064,496433,25308,6616,11210,13900)
Provide mean, median, Q1, Q3 for this set using R-CRAN functions.
example 2-2
Perform logaritmic transformation of the data in 2-1 and evaluate the distribution.
example 2-3
#######################################################
Selected methods and sample plots and solutions in R CRAN
###################
ANOVA
R statistical test implementation
###############################
# Practical linear regression training using 2012 London Olympics data
###################################
# install new packages in R CRAN on linux
install.packages("outliers", dependencies = TRUE)
#################################
Normality testing
Detection and treatment of outliers
Engineering statistic textbook
Grubbs implementation in graphpad
R CRAN outliers package Grubbs implementation
#####################################
Survival analysis
KM plots 1 - antrophological demography course
Enhanced Kaplan-Meier plot original
R CRAN plotting
###############################################
########### R Graphics training ##################
###############################################
####################
#R Graphics exercise 1
####################
# Creating vertical stripchart with
#R Code
#load the dataset from Oasis Brains project website - data is strored in csv file
oasis1 <- read.csv("http://www.oasis-brains.org/pdf/oasis_cross-sectional.csv", header=T, sep=",", dec=".")
#check whether the file loaded properly and check the names of the variables
names(oasis1)
# attach the dataset to be able to access the variables just using their names
attach(oasis1)
#in this exercise we are interested in one continuous variable (Age) and one factor - gender (M.F)
summary(M.F)
summary(Age)
#the aim is to produce publication ready quality plot of stripchart with jitter using those two variables
#we want file format png, however some journals want tiff files - check the tiff function in R using ?tiff
png(filename = "Oasis_age_gender_stripchart.png", width = 960, height = 960, units = "px", pointsize = 16, bg = "white", res = NA)
stripchart(Age ~ M.F, method = "jitter", vertical=TRUE, jitter = .1, pch=20, main="Distribution of age as a function of gender")
dev.off()
#0. Check the newly created file using image viewer. Check the directory where the image was created using getwd().
#1. Check the distribution of other variables like MMSE and eTIV (total intracranial volume) and create a new plot.
#2. Create a similar plot in different image file format like jpeg and tiff. Why are they different?
#3. Change the font size and jitter value and evaluate the results.
#4. Check stripchart function description
# the end of R Graphics exercise 1
####################
#R Graphics exercise 2
####################
# Creating a scatterplot with overlapping datasets and transparent colors
jpeg(filename = "overlapping_transparent_plots.jpeg", width = 960, height = 960, units = "px", pointsize = 17, quality = 100, bg = "white")
# create two variables x and y
x <- rnorm(100)+2
y <- rnorm(100)+1
#define color and transparency for each set of points
plotcolor <- rgb(red=255, green=100, blue=0, alpha=150, max=255)
plotcolor2 <- rgb(red=100, green=255, blue=100, alpha=200, max=255)
# plot the points(x,y) using created colors
plot(x,y, pch=21, lwd=3, col=plotcolor)
# plot the points(y,x) using created colors
points(y,x, pch=24, lwd=2, col=plotcolor2)
#close the device
dev.off()
#end of plot
#0. Check the visibility of various graphical elements in plots
#Controling points visibility in plots
pch=19: solid circle,
pch=20: bullet (smaller circle),
pch=21: circle,
pch=22: square,
pch=23: diamond,
pch=24: triangle point-up,
pch=25: triangle point down.
#1. Examine changing the transparency settings and color selection
R>col2rgb('blue', alpha=T)
[,1]
red 0
green 0
blue 255
alpha 255
R> rgb(red=0, green=0, blue=255, alpha=255, max=255)
[1] "#0000FFFF"
Play with different values of alpha (0 < alpha <= 255) in the above call to get different levels of opacity for your points.
R> rgb(red=0, green=0, blue=255, alpha=10, max=255)
[1] "#0000FF0A"
#end of exercise 2
####################
#R Graphics exercise 3
####################
Create scatterplot with third dimension defined as size of the points and control for the transparency of oveerlapping points
#define variables
x <- rnorm(10,5,3)
y <- rnorm(10,7,3)
#define variable representing pointsize
z <- seq(1:10)
#define color and transparency
plotcolor <- rgb(red=000, green=255, blue=000, alpha=100, max=255)
#plot
plot(x,y, pch=20, cex=z, col=plotcolor)
#define a vector of colors to set variable colors to each point size (in this example every third point)
colors <- c(rgb(red=100, green=100, blue=250, alpha=150, max=255), rgb(red=000, green=255, blue=000, alpha=50, max=255), rgb(red=000, green=000, blue=000, alpha=100, max=255))
plot(x,y, pch=20, cex=z, col=colors)
#tasks
# use sort() function to create plots with incremental point size
# use pch parameter to define representation of points
# use more advanced color and fill representation
# how to create dynamic plot in the google data style with time as fourth variable?
#end of exercise 3
External R Graphics resources
Creating plots in R
Cyclismo.org plotting tutorial
Summer Institute for Training in Biostatistics - graphics tutorial
add.scatter (ade4) - method for producing barchart with distribution
Neuromaps
Aim: Improve the speed of healthcare provider localization in neurological emergencies including rapid stroke diagnosis and treatment
Method: Integration of already existing data and services to enhance patient ability to quickly find healthcare provider in neurological emergencies
Spatial localization
Tools
R CRAN analysis of spatial data
Resources
Association of Geographic Information Laboratories for Europe
Polish geomarketing company - Gfk
Department of Cartography and GIS - PAN
Geological map of Poland - methods