Random notes about R language

3.0. Arrays are index with 1 base. I.e. "1" index of element in array corresponds to first element. Commands are separated either by a semi-colon (‘;’), or by a newline.

3.1. R is an interpreted programming language. The language is free without any payment. And the package repository is called CRAN.

3.2. Single quotes and double quotes are similar and are used for store strings.

3.3. All variables from console goto it's environment (scope in terminology of C/C++)

3.4. Vectorization is the main concept in language

3.5. Scalar is vector with size (1)

3.6. package - module of extensions.

Packages are stored in CRAN, and installed packages can be observed by installed.packages()

base package - packages which is included into R be default installation.

library - place on disk where packages are stored (call .libPaths() to look in this path)

library(grid) -- include package (similar to python "import")

install.packages("xts", dependencies = TRUE) -- install package with depencies

update.packages() -- update packages

sessionInfo() -- usefull function if some packages not work as need. Print version information about R, the OS and attached or loaded packages.

4. About RStudio compare to command line R:

-- There is available History of command

-- Up/Down arrow next and previous input command

-- In console in text ouput [n] say about position of the first number in total number of outputs from the command.

-- Hokeys: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts

-- Help for any command can be obtained from ?CMD_NAME, e.g. ?paste

5. Fundamental datatypes in R

Vectors (one dimensional array):

Can hold numeric, character or logical values. All elements should have similar type.

There are six of them: logical, integer, numeric/double, complex, character, raw (byte sequences).

If construct vector from different datatype then implicit conversions will occur:

logical->integer->double->character

For explicit data conversion functions "as.integer" should be used.

If binary operation is applied by vector with different length - then shortest will be repeated such number of times to make two vectors to have similar length. (recycling rule)

Vector as also another object in R has properties which exist always like length:

x=c(1:3)

length(x)

Attributes - optional addition "things" to objects. One example is names.

x=c(1:3);

names(x) = c('a','b','c');

attr(x, "my_aatr") <- c(1,2,3) # create you own attributes via

attributes(x) # show all available attributes for object x.

x['b'] # Access element by name

x[2] # Access element by index

vec[..][..] -- extract subvector with first oprator [] construction, and then apply [] for it.

a=c(1:10)

[1] 1 2 3 4 5 6 7 8 9 10

a[2:4]

[1] 2 3 4

a[2:4][2]

[1] 3

c(2, 4, 6) # Join elements into a vector

2:10 #An integer sequence

a=c(1:10)

a[1] # First element

a[-1] # All elements of vector, except first element

Matrices (two dimensional array):

Can hold numeric, character or logical values. The elements in a matrix all have the same data type. Matrices are filled by columns in R (dy default).

a = matrix(1:6, nrow=2)

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

dim(a); nrow(a); ncol(a) # can be used to retrive information about dimension of the matrix

rowSums(a); rowMeans(a);colSums(a); colMeans(a); # function for popular operation

a[1,2]; # access item in matrix by indicies

apply() -- function which can apply operation per rows and per columns

Lists:

Like vectors, but list can consist of elements of different type.

my_list = list(1:4,"qq")

names(my_list)<-c('a','b') # specify names

# Identical way to reference to the element of the list

print(my_list$b) # by $ syntax with using name of element

print(my_list['b']) # by character string with using name of element

print(my_list[[2]]) # by index of element

list(1:3) # can be used to convert vector to list

unlist(list(1:3)) # can be used (if all elements of list have same type) used to convert list to vector

my_list[2] # access to 2-nd element of the list, result type is a new list with single element

my_list[[2]] # access to 2-nd element of the list, result type is type of element.

my_list = list(1,2,3); my_list[[2]] <- NULL # Remove item from list

lapply -- apply function to element of list

Data frames (two-dimensional objects):

Can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.

Two dimension table with data, a.k.a spreadsheet.

a=data.frame(x=1:6,y=c(T,F)) # construct data frame

nrow(a); ncol(a) # number of rows and number of columns

colnames(a); # names of columns in data frame

length(a) # numer of columns

a[1,2] # Data frames indexing as for matrices.

str(a); summary(a); head(a); tail(a); # Can be used as a help tool that all is fine with data frame.

Dtaframes support operations similar to SQL:

--filtering/selecting

a=data.frame(x=1:6,y=c(T,F),z=1); a[a$x>2,]; # select rows from dataframe which has attribute x greater then 2

subset(a, x > 2) # equivalent statement

-- exclude some columns (https://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name)

a=data.frame(x=1:6,y=c(T,F),z=1); subset(a, select = -c(z))

--merging/inner join

a=data.frame(x=1:2,y=c(T,F)); b=data.frame(x=1:2,y=c(F,T)); merge(a, b, by = "y");

Character:

String type. In fact character is array of strings. And string is element of character. String can be written with double/single quotes.

> paste(c("one","two"), "qqqq")

[1] "one qqqq" "two qqqq"

> paste0(c("one","two"), "qqqq")

[1] "oneqqqq" "twoqqqq"

strsplit("qwerty,pppp,qqq", ",|y") # split string into substrings by regexp

tolower('Abc') # change register

toupper('abc') # change register

sprintf("Hello %i", 1) # C style sprintf

print/cat/formatC # various ways to perform printing

Factors:

Factors are hybrid between integer and string

f = factor(c("male", "female", "female", "male"))

> f

[1] male female female male

Levels: female male

> as.numeric(f)

[1] 2 1 1 2

> as.character(f)

[1] "male" "female" "female" "male"

> levels(f)

[1] "female" "male"

f = droplevels(f) # remove unused levels

levels(f) <- c("f","m") # rename levels

ordered(f) #convert factor into ordered-factors (slightly another type of factors)

Functions

# Define a simple function

myTestFun<-function(n)

{

# Compute the square of integer `n`

# The last expression evaluated in a function becomes the return value, the result of invoking the function.

n*n;

}

Functions are objects as vectors and etc. They are objects of first level. Some things about functions:

1. Function can return function

2. Function can be defined in other function

3. Functions can be passed as arguments and then you can derefenrece function via sapply(arg, func_name).

sapply(2, myTestFun)

(Looks that instead sapply usual C-style is working, i.e. call as func_name(arg))

4. If you pass arguments to function there are some strange rules that are used during arguments overlaoding:

4.1 If you pass arguments as k=v and there is some arguments with full name equal to "k" then this arguments assumed have been fixed

4.2 Partial name mathcing if "k" is unique prefix of some argument (before ."...")

4.3 Pull arguments by position

5. "..." are using as "bag" for not pulled parameters and during functions calls child functions try to pull arguments from this "bag".

6. In R it's possible to create own operators!. It's rather unique thing for programming languages.

7. Number of overloaded functions: length(methods(print))

Work with filess and data sources

XML, httr, rjson, RJSONIO, XLConnect, readxl, read.table/write.table -- read data

DBI, RSQLite, rmongodb -- access to databases

setwd()/getwd() # set and get current working directory

list.files() # Get list of files in current working directory

list.dirs(".") # Get list of directories

Compare by example with Python

Argumets are passing by copy, there is no pointers.

(But for improve perfomance there is exist mechanism copy on write)

# assignment operator "<-" or "->". One variable is read and result will be write in direction of the arrow

p.s. "=" also can be used but general style:

"<-" use for variables assignment

"=" use during function call if specify variables by names for function

(in python it's called naming argument)

# comment

x <- 2

s <- "1"

b <- TRUE

b <- T

b <- !FALSE

b <- !F

class(b) # This one describe class of obejct - matrix, array, list, data frame, etc.

typeof(b) # This one defines internal characterstics of storage for items

is.character(b)

v=vector(length = 10)

length(v)

# Fixed size vector

# "c" means concatenate

numeric_vector <- c(1, 10, 49)

1:5

3:-1

seq(1,10,by=0.25)

seq(1,10,length.out=4)

some_vector <- c("John Doe", "poker player"); names(some_vector) <- c("Name", "Profession")

a <- c(1, 2, 3)

b <- c(4, 5, 6)

c <- a + b

a=sum(c(1,2,3))

2 > 1

# Numeration of elements in the vector starts from 1

v=c(1,2,3); x=v[1]

c(11,12,13,14)

c(1,3)

2:4

Python like alternative

Arguments are passing be reference, there is no pointers.

#assignment operator

# similar comment

x = 2

s = "1"

b=True

type(b)

v=[False]*10

len(v)

# Fixed size list

numeric_vector = [1, 10, 49]

range(1,5+1)

range(3,-1-1,-1)

# no such built-in funcionality, but it's possible to achive similar behaviour with extra libraries like numpy

import numpy as np

np.arange(1,10+0.25,0.25)

# Such naming functionality does not exist

import numpy as np

a=np.array([1,2,3])

b=np.array([4,5,6])

c=a+b

a = np.sum(np.array([1,2,3]))

2 > 1

# Numeration of elements in the vector starts from 0

v=[1,2,3]; x=v[0]

np.array([11,12,13,14])[np.array([1,3])]

range(2,4+1)

a <- c(1, 2, 3)

names(a) <- c('name1', 'name2', 'name3')

a['name2'] <- 2

# Such naming functionality does not exist

c(4, 5, 6) > 5

c(4, 5, 6)[c(4, 5, 6) >= 5]

matrix(1:9, byrow = TRUE, nrow = 3)

c(c(1,2,3),c(1,2,3))

# Bind matrix a.k.a. create matrix from block submatrices

cbind(matrix(1:9,nrow=3), c(1:3))

rbind(matrix(1:9,nrow=3), c(1:3))

# Look in variables in global environment

ls()

(Enviroment tab in RStudio)

a=matrix(1:9, byrow = TRUE, nrow = 3)

a[1,1]

a[2:3,2:3]

a[,2]

+,-,*,/ is elementwize for matrices and vectors

# Open help about function

?factor

help("factor")

help('sin')

sex <- factor(c("male", "female", "female", "male"))

levels(sex) # female, male

nlevels(sex) # 2

name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")

length(name)

# Definition of vectors

name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")

type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")

diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)

rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)

rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

# Create a data frame from the vectors

planets_df <-data.frame(name, type, diameter, rotation, rings)

From data frame it's possible to pull elements with similar syntax as for matrices. Beside it also possible to extract columns by names like

planets_df[,3] -- pull third column

planets_df[,"diameter"] -- pull all column

planets_df$diameter -- shortcut for previous command

subset(planets_df, subset = (diameter<1)) -- select suset of rows with specific predicate

a=c(11,2,1)

a=a[order(a)]

b=list(1,"qq")

b=c(b,1)

# Comparison of numerics

-6 * 5 + 2 >= -10 + 1

# Comparison of character strings

"raining" < "raining dogs"

# Comparison of logicals

TRUE > FALSE

TRUE & TRUE

TRUE | FALSE

! TRUE

R supports logical operators for vectors & do it elementwise.

&&, || -- do it on first element of vector.

print("Hello")

if (TRUE)

{

}

else if(FALSE) {

}else{

# move "else" to newline is prohibited

}

repeat {print(1);}

while (TRUE) {print(1);}

for (i in 1:4){print(i);}

#Special ifelse condition for elements of the vectors

# Take 8 uniform distributed numbers

runif(8)

[1] 0.09716141 0.43703337 0.74234746 0.45379926 0.70599167 0.13407545 0.48520433 0.04240978

# Take vector of logical values from it

runif(8) > 0.5

[1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE

# Use ifelse for exotic construction

ifelse(runif(8) > 0.5, "Y", "N")

[1] "Y" "N" "N" "Y" "N" "N" "Y" "Y"

# functions

myadd <- function(x) {x=x+10}

myadd <- function(x) {x=x+10; return x;}

myadd <- function(x) {my_global <<- x+10}

# remove global variable

ls()

rm(my_global)

# remove all variables from global scope

rm(list=ls())

# If use naming argument syntax then names of argument should not nessesary have fullname, name should represent a unique prefix between all possible function arguments

f <- function(param){return(param+2);}

f(par=2)

[1] 4

# repeat vector 3 times

rep(c(1,2,3), times=3)

# repeat each item of vector by 3 times

rep(c(1,2,3), each=3)

# compare float variable

all.equal()

NULL

5 %% 2

sqrt(1:4)

break

system.time({for (i in 1:5000000) sqrt(5)})

# Different semantic of -n for vector

# -n exclude "n" item

a=c(0,1,2,3,4)

a[-1]

>> 1 2 3 4

x=c(2,2,0,0,10)

which.min(x)

[1] 3

which(x==min(x))

[1] 3 4

# matrix multiplication as in linear algebra

matrix(1:4, nrow=2) %*% matrix(1:4, nrow=2)

# transpose matrix

t(matrix(1:4, nrow=2))

... -- unlimited number of arguments for function

If there is no return in function - result is the value of last evaluated expression

# Extensions for scripts

*.R

np.array([4,5,6]) > 5

np.array([4,5,6])[np.array([4,5,6]) >= 5]

np.reshape(np.array([1,2,3,4,5,6,7,8,9]), (3,3))

np.concatenate((np.array([1,2,3]), np.array([1,2,3])))

dir()

a=np.reshape(np.array([1,2,3,4,5,6,7,8,9]), (3,3))

a[0,0]

a[1:3,1:3]

a[:,1]

+,-,*,/ is elementwize for matrices and vectors

help(dir)

not represented, but nearest concept is the set.

name=["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"]

len(name)

a=[11,2,1]

a.sort()

b=[1,"qq"]

b +=[1]

# Comparison of numerics

-6 * 5 + 2 >= -10 + 1

# Comparison of character strings

"raining" < "raining dogs"

# Comparison of logicals

True> False

True and True

True or False

not True

Numpy or Python doesn't support logical operators for vectors

print("Hello")

if True:

pass

elif False:

pass

else:

pass

while True: print 1

for i in range(1,4 + 1): print i

# No such thing

# functions

def myadd(x): x=x+10

def myadd(x): x=x+10; return x;

def myadd(x): global my_global; my_global = x+10;

# remove global variable

dir()

del my_global

# remove all variables from global scope

for to_remove in [i for i in dir() if not i.startswith("_")]: del globals()[to_remove];

#If use naming argument syntax then names of argument should have fullname

f=lambda param: param+2

f(par=12)

Traceback ....<lambda>() got an unexpected keyword argument 'par'

import numpy as np

import numpy.matlib as ml

ml.repmat(np.array([1,2,3]),1,3)

# Maybe there is no such functionality

# Such built-in funcionality is not presented

None

5 % 2

np.sqrt(np.array([1,2,3,4]))

continue

break

import timeit

timeit.timeit("math.sqrt(5)", setup='import math', number=5000000)

# Different semantic of -n for vector

# - get "n" item from the tail of vector

a=[0,1,2,3,4]

a[-1]

>> 4

def func(*k): pass

In python: if there is no return in function - result is None

In C/C++: absence of the return value will lead to undefined behaviour

# Extension for scripts

*.py

Page updated

Google Sites

Report abuse