0. Materials:
[1] https://www.datacamp.com/courses/free-introduction-to-r
[2] https://stepik.org/course/497
[3] http://www.rdocumentation.org/packages/base/versions/3.4.3/
[4] https://stepik.org/course/497/
[5] https://rstudio.com/wp-content/uploads/2016/10/r-cheat-sheet-3.pdf - usefull cheatsheet
[6] https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts - rstudio shortcuts
1. RStudio installation
linux:
https://www.rstudio.com/products/rstudio/download/#download
sudo apt-get install libjpeg62
sudo dpkg -i rstudio-xenial-1.1.442-amd64.deb
sudo apt-get install -f
sudo apt-get install r-base
Windows:
https://cloud.r-project.org/ => binaries by default will be hear "C:/Program Files/R/R-3.4.4/bin"
https://www.rstudio.com/products/RStudio/
2. Documentation:
https://cran.r-project.org/doc/FAQ/R-FAQ.html
3. About language:
3.0. Arrays are index with 1 base. I.e. "1" index of element in array corresponds to first element. Commands are separated either by a semi-colon (‘;’), or by a newline.
3.1. R is an interpreted programming language. The language is free without any payment. And the package repository is called CRAN.
3.2. Single quotes and double quotes are similar and are used for store strings.
3.3. All variables from console goto it's environment (scope in terminology of C/C++)
3.4. Vectorization is the main concept in language
3.5. Scalar is vector with size (1)
3.6. package - module of extensions.
Packages are stored in CRAN, and installed packages can be observed by installed.packages()
base package - packages which is included into R be default installation.
library - place on disk where packages are stored (call .libPaths() to look in this path)
library(grid) -- include package (similar to python "import")
install.packages("xts", dependencies = TRUE) -- install package with depencies
update.packages() -- update packages
sessionInfo() -- usefull function if some packages not work as need. Print version information about R, the OS and attached or loaded packages.
4. About RStudio compare to command line R:
-- There is available History of command
-- Up/Down arrow next and previous input command
-- In console in text ouput [n] say about position of the first number in total number of outputs from the command.
-- Hokeys: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts
-- Help for any command can be obtained from ?CMD_NAME, e.g. ?paste
5. Fundamental datatypes in R
Vectors (one dimensional array):
Can hold numeric, character or logical values. All elements should have similar type.
There are six of them: logical, integer, numeric/double, complex, character, raw (byte sequences).
If construct vector from different datatype then implicit conversions will occur:
logical->integer->double->character
For explicit data conversion functions "as.integer" should be used.
If binary operation is applied by vector with different length - then shortest will be repeated such number of times to make two vectors to have similar length. (recycling rule)
Vector as also another object in R has properties which exist always like length:
x=c(1:3)
length(x)
Attributes - optional addition "things" to objects. One example is names.
x=c(1:3);
names(x) = c('a','b','c');
attr(x, "my_aatr") <- c(1,2,3) # create you own attributes via
attributes(x) # show all available attributes for object x.
x['b'] # Access element by name
x[2] # Access element by index
vec[..][..] -- extract subvector with first oprator [] construction, and then apply [] for it.
a=c(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
a[2:4]
[1] 2 3 4
a[2:4][2]
[1] 3
c(2, 4, 6) # Join elements into a vector
2:10 #An integer sequence
a=c(1:10)
a[1] # First element
a[-1] # All elements of vector, except first element
Matrices (two dimensional array):
Can hold numeric, character or logical values. The elements in a matrix all have the same data type. Matrices are filled by columns in R (dy default).
a = matrix(1:6, nrow=2)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
dim(a); nrow(a); ncol(a) # can be used to retrive information about dimension of the matrix
rowSums(a); rowMeans(a);colSums(a); colMeans(a); # function for popular operation
a[1,2]; # access item in matrix by indicies
apply() -- function which can apply operation per rows and per columns
Lists:
Like vectors, but list can consist of elements of different type.
my_list = list(1:4,"qq")
names(my_list)<-c('a','b') # specify names
# Identical way to reference to the element of the list
print(my_list$b) # by $ syntax with using name of element
print(my_list['b']) # by character string with using name of element
print(my_list[[2]]) # by index of element
list(1:3) # can be used to convert vector to list
unlist(list(1:3)) # can be used (if all elements of list have same type) used to convert list to vector
my_list[2] # access to 2-nd element of the list, result type is a new list with single element
my_list[[2]] # access to 2-nd element of the list, result type is type of element.
my_list = list(1,2,3); my_list[[2]] <- NULL # Remove item from list
lapply -- apply function to element of list
Data frames (two-dimensional objects):
Can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
Two dimension table with data, a.k.a spreadsheet.
a=data.frame(x=1:6,y=c(T,F)) # construct data frame
nrow(a); ncol(a) # number of rows and number of columns
colnames(a); # names of columns in data frame
length(a) # numer of columns
a[1,2] # Data frames indexing as for matrices.
str(a); summary(a); head(a); tail(a); # Can be used as a help tool that all is fine with data frame.
Dtaframes support operations similar to SQL:
--filtering/selecting
a=data.frame(x=1:6,y=c(T,F),z=1); a[a$x>2,]; # select rows from dataframe which has attribute x greater then 2
subset(a, x > 2) # equivalent statement
-- exclude some columns (https://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name)
a=data.frame(x=1:6,y=c(T,F),z=1); subset(a, select = -c(z))
--merging/inner join
a=data.frame(x=1:2,y=c(T,F)); b=data.frame(x=1:2,y=c(F,T)); merge(a, b, by = "y");
Character:
String type. In fact character is array of strings. And string is element of character. String can be written with double/single quotes.
> paste(c("one","two"), "qqqq")
[1] "one qqqq" "two qqqq"
> paste0(c("one","two"), "qqqq")
[1] "oneqqqq" "twoqqqq"
strsplit("qwerty,pppp,qqq", ",|y") # split string into substrings by regexp
tolower('Abc') # change register
toupper('abc') # change register
sprintf("Hello %i", 1) # C style sprintf
print/cat/formatC # various ways to perform printing
Factors:
Factors are hybrid between integer and string
f = factor(c("male", "female", "female", "male"))
> f
[1] male female female male
Levels: female male
> as.numeric(f)
[1] 2 1 1 2
> as.character(f)
[1] "male" "female" "female" "male"
> levels(f)
[1] "female" "male"
f = droplevels(f) # remove unused levels
levels(f) <- c("f","m") # rename levels
ordered(f) #convert factor into ordered-factors (slightly another type of factors)
Functions
# Define a simple function
myTestFun<-function(n)
{
# Compute the square of integer `n`
# The last expression evaluated in a function becomes the return value, the result of invoking the function.
n*n;
}
Functions are objects as vectors and etc. They are objects of first level. Some things about functions:
1. Function can return function
2. Function can be defined in other function
3. Functions can be passed as arguments and then you can derefenrece function via sapply(arg, func_name).
sapply(2, myTestFun)
(Looks that instead sapply usual C-style is working, i.e. call as func_name(arg))
4. If you pass arguments to function there are some strange rules that are used during arguments overlaoding:
4.1 If you pass arguments as k=v and there is some arguments with full name equal to "k" then this arguments assumed have been fixed
4.2 Partial name mathcing if "k" is unique prefix of some argument (before ."...")
4.3 Pull arguments by position
5. "..." are using as "bag" for not pulled parameters and during functions calls child functions try to pull arguments from this "bag".
6. In R it's possible to create own operators!. It's rather unique thing for programming languages.
7. Number of overloaded functions: length(methods(print))
Work with filess and data sources
XML, httr, rjson, RJSONIO, XLConnect, readxl, read.table/write.table -- read data
DBI, RSQLite, rmongodb -- access to databases
setwd()/getwd() # set and get current working directory
list.files() # Get list of files in current working directory
list.dirs(".") # Get list of directories
Compare by example with Python
R
Argumets are passing by copy, there is no pointers.
(But for improve perfomance there is exist mechanism copy on write)
# assignment operator "<-" or "->". One variable is read and result will be write in direction of the arrow
p.s. "=" also can be used but general style:
"<-" use for variables assignment
"=" use during function call if specify variables by names for function
(in python it's called naming argument)
# comment
x <- 2
s <- "1"
b <- TRUE
b <- T
b <- !FALSE
b <- !F
class(b) # This one describe class of obejct - matrix, array, list, data frame, etc.
typeof(b) # This one defines internal characterstics of storage for items
is.character(b)
v=vector(length = 10)
length(v)
# Fixed size vector
# "c" means concatenate
numeric_vector <- c(1, 10, 49)
1:5
3:-1
seq(1,10,by=0.25)
seq(1,10,length.out=4)
some_vector <- c("John Doe", "poker player"); names(some_vector) <- c("Name", "Profession")
a <- c(1, 2, 3)
b <- c(4, 5, 6)
c <- a + b
a=sum(c(1,2,3))
2 > 1
# Numeration of elements in the vector starts from 1
v=c(1,2,3); x=v[1]
c(11,12,13,14)
c(1,3)
2:4
Python like alternative
Arguments are passing be reference, there is no pointers.
#assignment operator
=
# similar comment
x = 2
s = "1"
b=True
type(b)
v=[False]*10
len(v)
# Fixed size list
numeric_vector = [1, 10, 49]
range(1,5+1)
range(3,-1-1,-1)
# no such built-in funcionality, but it's possible to achive similar behaviour with extra libraries like numpy
import numpy as np
np.arange(1,10+0.25,0.25)
# Such naming functionality does not exist
import numpy as np
a=np.array([1,2,3])
b=np.array([4,5,6])
c=a+b
a = np.sum(np.array([1,2,3]))
2 > 1
# Numeration of elements in the vector starts from 0
v=[1,2,3]; x=v[0]
np.array([11,12,13,14])[np.array([1,3])]
range(2,4+1)
a <- c(1, 2, 3)
names(a) <- c('name1', 'name2', 'name3')
a['name2'] <- 2
# Such naming functionality does not exist
c(4, 5, 6) > 5
c(4, 5, 6)[c(4, 5, 6) >= 5]
matrix(1:9, byrow = TRUE, nrow = 3)
c(c(1,2,3),c(1,2,3))
# Bind matrix a.k.a. create matrix from block submatrices
cbind(matrix(1:9,nrow=3), c(1:3))
rbind(matrix(1:9,nrow=3), c(1:3))
# Look in variables in global environment
ls()
(Enviroment tab in RStudio)
a=matrix(1:9, byrow = TRUE, nrow = 3)
a[1,1]
a[2:3,2:3]
a[,2]
+,-,*,/ is elementwize for matrices and vectors
# Open help about function
?factor
help("factor")
help('sin')
sex <- factor(c("male", "female", "female", "male"))
levels(sex) # female, male
nlevels(sex) # 2
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
length(name)
# Definition of vectors
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
# Create a data frame from the vectors
planets_df <-data.frame(name, type, diameter, rotation, rings)
From data frame it's possible to pull elements with similar syntax as for matrices. Beside it also possible to extract columns by names like
planets_df[,3] -- pull third column
planets_df[,"diameter"] -- pull all column
planets_df$diameter -- shortcut for previous command
subset(planets_df, subset = (diameter<1)) -- select suset of rows with specific predicate
a=c(11,2,1)
a=a[order(a)]
b=list(1,"qq")
b=c(b,1)
# Comparison of numerics
-6 * 5 + 2 >= -10 + 1
# Comparison of character strings
"raining" < "raining dogs"
# Comparison of logicals
TRUE > FALSE
TRUE & TRUE
TRUE | FALSE
! TRUE
R supports logical operators for vectors & do it elementwise.
&&, || -- do it on first element of vector.
print("Hello")
if (TRUE)
{
}
else if(FALSE) {
}else{
# move "else" to newline is prohibited
}
repeat {print(1);}
while (TRUE) {print(1);}
for (i in 1:4){print(i);}
#Special ifelse condition for elements of the vectors
# Take 8 uniform distributed numbers
runif(8)
[1] 0.09716141 0.43703337 0.74234746 0.45379926 0.70599167 0.13407545 0.48520433 0.04240978
# Take vector of logical values from it
runif(8) > 0.5
[1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
# Use ifelse for exotic construction
ifelse(runif(8) > 0.5, "Y", "N")
[1] "Y" "N" "N" "Y" "N" "N" "Y" "Y"
# functions
myadd <- function(x) {x=x+10}
myadd <- function(x) {x=x+10; return x;}
myadd <- function(x) {my_global <<- x+10}
# remove global variable
ls()
rm(my_global)
# remove all variables from global scope
rm(list=ls())
# If use naming argument syntax then names of argument should not nessesary have fullname, name should represent a unique prefix between all possible function arguments
f <- function(param){return(param+2);}
f(par=2)
[1] 4
# repeat vector 3 times
rep(c(1,2,3), times=3)
# repeat each item of vector by 3 times
rep(c(1,2,3), each=3)
# compare float variable
all.equal()
NULL
5 %% 2
sqrt(1:4)
next
break
system.time({for (i in 1:5000000) sqrt(5)})
# Different semantic of -n for vector
# -n exclude "n" item
a=c(0,1,2,3,4)
a[-1]
>> 1 2 3 4
x=c(2,2,0,0,10)
which.min(x)
[1] 3
which(x==min(x))
[1] 3 4
# matrix multiplication as in linear algebra
matrix(1:4, nrow=2) %*% matrix(1:4, nrow=2)
# transpose matrix
t(matrix(1:4, nrow=2))
... -- unlimited number of arguments for function
If there is no return in function - result is the value of last evaluated expression
# Extensions for scripts
*.R
np.array([4,5,6]) > 5
np.array([4,5,6])[np.array([4,5,6]) >= 5]
np.reshape(np.array([1,2,3,4,5,6,7,8,9]), (3,3))
np.concatenate((np.array([1,2,3]), np.array([1,2,3])))
dir()
a=np.reshape(np.array([1,2,3,4,5,6,7,8,9]), (3,3))
a[0,0]
a[1:3,1:3]
a[:,1]
+,-,*,/ is elementwize for matrices and vectors
help(dir)
not represented, but nearest concept is the set.
name=["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"]
len(name)
a=[11,2,1]
a.sort()
b=[1,"qq"]
b +=[1]
# Comparison of numerics
-6 * 5 + 2 >= -10 + 1
# Comparison of character strings
"raining" < "raining dogs"
# Comparison of logicals
True> False
True and True
True or False
not True
Numpy or Python doesn't support logical operators for vectors
print("Hello")
if True:
pass
elif False:
pass
else:
pass
while True: print 1
for i in range(1,4 + 1): print i
# No such thing
# functions
def myadd(x): x=x+10
def myadd(x): x=x+10; return x;
def myadd(x): global my_global; my_global = x+10;
# remove global variable
dir()
del my_global
# remove all variables from global scope
for to_remove in [i for i in dir() if not i.startswith("_")]: del globals()[to_remove];
#If use naming argument syntax then names of argument should have fullname
f=lambda param: param+2
f(par=12)
Traceback ....<lambda>() got an unexpected keyword argument 'par'
import numpy as np
import numpy.matlib as ml
ml.repmat(np.array([1,2,3]),1,3)
# Maybe there is no such functionality
# Such built-in funcionality is not presented
None
5 % 2
np.sqrt(np.array([1,2,3,4]))
continue
break
import timeit
timeit.timeit("math.sqrt(5)", setup='import math', number=5000000)
# Different semantic of -n for vector
# - get "n" item from the tail of vector
a=[0,1,2,3,4]
a[-1]
>> 4
def func(*k): pass
In python: if there is no return in function - result is None
In C/C++: absence of the return value will lead to undefined behaviour
# Extension for scripts
*.py