R
http://www.noitulove.ch/2008/07/03/learning-r-part-i/
http://rattle.togaware.com Rattle: Gnome R Data Mining
http://www.statmethods.net/index.html
ROOT http://root.cern.ch/drupal/
> x=c(1,2,3)
> sum(x)
[1] 6
> mean(x)
[1] 2
> length(x)
[1] 3
> ls()
[1] "weight" "x"
> objects()
[1] "weight" "x"
> 1/x
[1] 1.0000000 0.5000000 0.3333333
> y=2*x
> y
[1] 2 4 6
> plot(x,y)
> x=1:25
> y=sqrt(x)
> plot(x,y)
# add line to current graph
>lines(x,x^2)
>lines(x,log(x))
abline
curve
edit data
> data.entry(x) # Pops up spreadsheet to edit data
> x = de(x) # same only, doesn't save changes
> x = edit(x) # uses editor to edit x.
barpot
> x=c(1,2,2,3,3,3,4,4,4,4)
> barplot(table(x)) #barplot of frequencies
> barplot(table(x)/length(x))
> x=seq(0,4,by=.1) # create the x values
> plot(x,x^2,type="l") # type="l" to make line
> curve(x^2,0,4)
f(x) = 1/(sqrt(2 pi) sigma) e^-((x - mu)^2/(2 sigma^2))
dnorm(x, mean=0, sd=1, log = FALSE)
pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean=0, sd=1)
x,q vector of quantiles.
p vector of probabilities.
n number of observations. If length(n) > 1, the length is taken to be the number required.
mean vector of means.
sd vector of standard deviations.
log, log.p logical; if TRUE, probabilities p are given as log(p).
lower.tail logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].
x=seq(-4,4,0.1)
plot(x,dnorm(x),type="l")
or curve(dnorm(x),from=-4,to=4)
density and cumulative distribution on same graph
x=-10:10
plot(x, pnorm(x),type="l")
lines(x, dnorm(x),type="l")
> x=rnorm(100)
> hist(x,freq=F)
> curv(dnorm(x),add=T)
x=-10:10
plot(x, pnorm(x),type="l")
plot(x, dnorm(x),type="l")
Binomial distribution
> x=0:50
> dbinom(x,size=50,prob=0.33)
plot(x, dbinom(x,size=50,prob=0.33), type="h")
Functions are provided to evaluate the cumulative distribution function P(X <= x), the probability density function and the quantile function (given q, the smallest x such that P(X <= x) > q), and to simulate from the distribution.
Prefix the name given here by `d' for the density, `p' for the CDF, `q' for the quantile function and `r' for simulation (random deviates). The first argument is x for dxxx, q for pxxx, p for qxxx and n for rxxx (except for rhyper and rwilcox, for which it is nn). In not quite all cases is the non-centrality parameter ncp are currently available: see the on-line help for details.
The pxxx and qxxx functions all have logical arguments lower.tail and log.p and the dxxx ones have log. This allows, e.g., getting the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by
- pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)
or more accurate log-likelihoods (by dxxx(..., log = TRUE)), directly.
> HousePrice <- read.table("c:\\floor.txt")
> HousePrice
Price Floor Area Rooms Age Cent.heat
01 52.00 111 830 5 6.2 no
02 54.75 128 710 5 7.5 no
03 57.50 101 1000 5 4.2 no
04 57.50 131 690 6 8.8 no
05 59.75 93 900 5 1.9 yes
> summary(HousePrice)
Price Floor Area Rooms Age Cent.heat
Min. :52.00 Min. : 93.0 Min. : 690 Min. :5.0 Min. :1.90 no :4
1st Qu.:54.75 1st Qu.:101.0 1st Qu.: 710 1st Qu.:5.0 1st Qu.:4.20 yes:1
Median :57.50 Median :111.0 Median : 830 Median :5.0 Median :6.20
Mean :56.30 Mean :112.8 Mean : 826 Mean :5.2 Mean :5.72
3rd Qu.:57.50 3rd Qu.:128.0 3rd Qu.: 900 3rd Qu.:5.0 3rd Qu.:7.50
Max. :59.75 Max. :131.0 Max. :1000 Max. :6.0 Max. :8.80
> area=HousePrice$Area #access to individual column
> mean(area)
[1] 826
> sd(area)
[1] 130.1153
> price=HousePrice$Price
> cor(area,price)
[1] 0.2982011
> l1 = lm(price ~ area)
> summary(l1)
Call:
lm(formula = price ~ area)
Residuals:
1 2 3 4 5
-4.327377 -0.756054 0.009082 2.130833 2.943517
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.646559 10.550880 4.800 0.0172 *
area 0.006844 0.012649 0.541 0.6260
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.292 on 3 degrees of freedom
Multiple R-Squared: 0.08892, Adjusted R-squared: -0.2148
F-statistic: 0.2928 on 1 and 3 DF, p-value: 0.626
> l2 = lm(price ~ +I(sin(2*pi*area)) +I(cos(2*pi*area)))
> summary(l2)
Packages in library 'C:/PROGRA~1/R/R-2.6.0pat/library':
base The R Base Package
boot Bootstrap R (S-Plus) Functions (Canty)
class Functions for Classification
cluster Cluster Analysis Extended Rousseeuw et al.
codetools Code Analysis Tools for R
datasets The R Datasets Package
foreign Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ...
graphics The R Graphics Package
grDevices The R Graphics Devices and Support for Colours and Fonts
grid The Grid Graphics Package
KernSmooth Functions for kernel smoothing for Wand & Jones (1995)
lattice Lattice Graphics
MASS Main Package of Venables and Ripley's MASS
methods Formal Methods and Classes
mgcv GAMs with GCV smoothness estimation and GAMMs by REML/PQL
nlme Linear and Nonlinear Mixed Effects Models
nnet Feed-forward Neural Networks and Multinomial Log-Linear Models
rcompgen Completion generator for R
rpart Recursive Partitioning
spatial Functions for Kriging and Point Pattern Analysis
splines Regression Spline Functions and Classes
stats The R Stats Package
stats4 Statistical Functions using S4 Classes
survival Survival analysis, including penalised likelihood.
tcltk Tcl/Tk Interface
tools Tools for Package Development
utils The R Utils Package
rle, filter, which
seq1=c(1,2,3,2)
seq2=sort(seq1)
rle(seq2)
Run Length Encoding
lengths: int [1:3] 1 2 1
values : num [1:3] 1 2 3
nums = c(12,9,8,14,7,16,3,2,9)
nums[nums>10]
[1] 12 14 16
> which(nums>10)
[1] 1 4 6
> x = matrix(1:12,4,3)
> x
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
#mean of each row of a matrix
rowMeans(x)
[1] 5 6 7 8
#mean of each row of a matrix
apply(x,1,mean) #1 means row here
[1] 5 6 7 8
> x[,1] #first column
[1] 1 2 3 4
> x[,c(3,1)] #3rd and 1st columns
[,1] [,2]
[1,] 9 1
[2,] 10 2
[3,] 11 3
[4,] 12 4
> x[2,] #second row
[1] 2 6 10
> x[10]
[1] 10
sum(x[1,]) # sum of first row
> apply(x, 1, sum) # by row
[1] 22 26 30
> apply(x, 2, sum) # by column
The autoregressive model is one of a group of linear prediction formulas that attempt to predict an output of a system based on the previous outputs and inputs, such as:
Y(t) = b1 + b2Y(t-1) + b3X(t-1) + et,
where X(t-1) and Y(t-1) are the actual value (inputs) and the forecast (outputs), respectively.
A model which depends only on the previous outputs of the system is called an autoregressive model (AR), while a model which depends only on the inputs to the system is called a moving average model (MA), and of course a model based on both inputs and outputs is an autoregressive-moving-average model (ARMA). Note that by definition, the AR model has only poles while the MA model has only zeros. Deriving the autoregressive model (AR) involves estimating the coefficients of the model using the method of least squared error.
Autoregressive processes as their name implies, regress on themselves. If an observation made at time (t), then, p-order, [AR(p)], autoregressive model satisfies the equation:
X(t) = F0 + F1X(t-1) + F2X(t-2) + F2X(t-3) + . . . . + FpX(t-p) + et,
where et is a White-Noise series.
R code example for AR(1) from here http://jblevins.org/computing/r/mle/ local file mre.R
Plotting and sorting example:
tzsize <-read.table("c:\\michael\\tz_size.txt", header=TRUE)
sorted= tzsize[order(tzsize$tz),]
x = sorted$size
names(x) = sorted$tz
barplot(x)
mycolors=c("red","blue","green","brown")
barplot(x,col=mycolors)
> t=read.table("table_with_2_columns.txt", sep="|")
> plot(t$V1,t$V2/1000000000 , main=" title here")
Pairing barplot using as.matrix http://www.statmethods.net/graphs/bar.html
If argument of barplot is matrix then beside=TRUE is for grouped bars beside=FALSE for stacked bar
http://www.harding.edu/fmccown/r/
t <-read.table("tz.txt", header=TRUE)
barplot(as.matrix(rbind(t$X._size,t$X._count)), main="TImeZones", ylab= "Total",
beside=TRUE, col=rainbow(2), names.arg=t$tz)
legend("topleft", c("%Size","%Records"), cex=0.6, bty="n", fill=rainbow(2));
Stacked Bar Example
barplot(as.matrix(cbind(t$X._size,t$X._count)))
http://www.packtpub.com/article/customizing-graphics-creating-bar-chart-scatterplot-r
http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html
http://www.statmethods.net/graphs/bar.html
http://www.r-tutor.com/ http://www.harding.edu/fmccown/r/
http://stotastic.com/wordpress/2010/04/case-shiller/
Lattice
library("lattice")
p <- barchart((1:10)^2~1:10, horiz=FALSE, ylim=c(0,120),
panel=function(...) {
args <- list(...);
panel.text(args$x, args$y+2, args$y);
panel.barchart(...)
})
print(p)
MyData <- as.data.frame(Titanic)
library(lattice)
barchart(Freq ~ Survived | Age * Sex, groups = Class, data = MyData,
auto.key = list(points = FALSE, rectangles = TRUE, space
= "right", title = "Class", border = TRUE), xlab = "Survived",
ylim = c(0, 800))