Data Analytics with R

This course is designed for the 3rd Semester MBA students of the School of Management, Techno India Group. The lectures is totally focused on the data analytics in R Software. The first 2 to 3 classes will be helpful to all of the students for the development of the basic understanding with the R software. Then the following next 3 classes will give you the lesson on the data handling portion and the associated statistical analysis.

Acknowledgement:

I would like to thank my supervisor Prof. Sabyasachi Bhattacharya, and the dean of school of management of Techno India, Prof. Amit Kundu for giving such an opportunity to take these classes.

Introduction with R software:

It was begun in the 1990s by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland. Nearly 20 senior statisticians provide the core development group of the R language, including the primary developer of the original §language, John Chambers, of Bell Labs. It contains a large, coherent, integrated collection of intermediate tools for data analysis. The software is FREE and can be downloaded from http://www.r-project.org/ or you can download from R software . The versatility of this software is that it can be used in every domain and the coding structure in this software is quite easy rather than the others. Moreover, R software is used not only for coding purpose, you can also prepare any manuscript or prepare any sorts of presentation in this software. R markdown will help in this case. The major yardstick of the software is the R packages. Basically, Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. Currently, the CRAN package repository features 10964 available packages.https://cran.r-project.org/web/packages/. To install packages type install.packages(pkgs, dependencies=TRUE) in the console.

In most of my research work i have used this software. Moreover, for any sorts of discrepancy do not hesitate to contact me. The contact details is provided in my home page. Don't worry this page will be update as per your needs.

References:

An introduction to R, Longhow Lam. (Link for the material https://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf )
Applied Statistical Inference: Likelihood and Bayes by Leohard Held and Daniel Sabanes Bove, Springer-Verlag Berlin 2014
The R Student Companion, Brian Dennis, CRC Press, 2013.
An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie and Tibshirani, Springer Text in Statistics 2013
Using R for Numerical Analysis in Science and Engineering by Victor A. Broomfield, CRC Press. Taylor and Francis Group 2014
A Primer of Ecology with R by M. Henry and H. Stevens, Springer 2009
Statistical Modeling: The Two Cultures by Leo Breiman, Statistical Science 2001, Vol. 16, No. 3, 199-231.
The Art of R Programming; Norman Matloff
AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS; SALVATORE S. MANGIAFICO : https://rcompanion.org/documents/RCompanionBioStatistics.pdf

Day 1 : 16th January, 2021. (12.30 p.m. - 1.30 p.m.)

1. Introduction in R Programming:

Before diving into the core statistical analysis of the data sets, we should be accustomed with the R software a little bit. The following lecture will help you to establish a basic understanding with the R software.

x = 2 ### storing a particular value

##### To run any code use "Ctrl + Enter" togetherly from your key board.

print(x) #### Print that value

y = 7

print(y)

###### Some Fundamental operations ###

z1 = x+y ### Addition of two numbers

z2 = x-y ### Subtraction of two numbers

z3 = x*y; z3 ### Multiplication of two numbers

z4 = x/y; z4 #### Division of two numbers

##### Storing of more than one numbers and perform the same operation as mentioned above

### Vector creation

x1 = c(3, 5, 2, 6) ##### 4 numbers are stored here and c stands for concatenation

length(x1) ##### Check that how many numbers are present there

y1 = c(10, 5, 3, 9)

length(y1)

#### Repeat the same process

a = x1 + y1; a

e = sum(x1, y1); e #### Using R command we can also compute the summation

b = x1 - y1; b

c = x1*y1; c

d = x1/y1; d

#### Sequence of number generation

#### There are two ways of number generation.

##### The first method can be applied if the common difference

##### between the numbers are only 1 and that of

###### second method is applicable for all possible of common difference.

x1 = 1:10; x1 ### 1st 10 natural numbers will generate here

x2 = seq(from = 1, to = 10, by = 3) ### Note that the common difference between the numbers be 3

#### Length wise number generation

x3 = seq(1, 50, length.out = 30) #### Some of the appeared numbers should be fraction

Day 2 : 19th January, 2021. (9.00 a.m. - 11.00 a.m.)

Today we divide the two hour class into two sessions. The first half will be on the continuation of basic courses i.e. the introduction portion and the second one is one the Graphical course in R software. The following materials is also diving in the same way. Schedule

2. Continuation on the "Introduction Portion"

#### The shortcut command for opening the R script through Keyboard is the

### Ctrl+Shift+N

### Use of some in buit functions ###

x = -10

abs(x) ### Returns the absolute value

y = 1.02365

round(y) ### Only giving the rounding off values

round(y, digits= 3)

z = log(1)

q = log(2, base = 10)

p = log(10, base = 10)

print(p)

print("The line is required") ### Line will be appeared under double quotes

cat("The line is required") ### Line will be not appeared under double quotes

### Repeating any number

a = rep(2, 10)

length(a)

b = rep(0, 10)

c = numeric(15) ### 15 zeros will be appeared

length(c)

rep(NA, 5)

d = seq(1, 10)

head(d, 3)

tail(d, 2)

tail(d, 5)

e = seq(1, 10, by = 3)

f = seq(from = 1, to = 10, length.out = 5)

######

x = c(5, 7.3, 8, 9.2, 6.3, 4, 10)

sum(x)

x^2

max(x)

min(x)

mean(x)

var(x)

sd(x)

s = sort(x)

length(x)

median.position = (length(x)+1)/2

median.position

median = s[median.position]

####

x1 = c(-5, 6, -9, 2)

x1>0

x1<0

a1 = which(x1>0)

length(a1)

b1 = which(x1<0)

length(b1)

### Our objective is to print only the positive numbers

a1 = which(x1>0)

a1 ## The positions where the positive numbers are located

x1[a1] ## It returns you the positive numbers.

### Our objective is to print only the negative numbers

b1 = which(x1<0)

b1 ## The positions where the negative numbers are located

x1[b1] ## It returns you the negative numbers.

3. Graphix portion in the R software:

x = c(160, 162, 163, 165, 168, 170)

plot(x, type = "p") ### Scatter Plot

plot(x, type = "l") ### Line Plot

plot(x, type = "b") ### Line and Point together in a Plot

plot(x, type = "b", xlab = "Number of Persons", ylab = "Height")