Workshop in the Deparment of Management, North Eastern Hill University, Meghalaya
(13th March-19th March, 2023)
The workshop is organized by the Department of Management, North Eastern Hill University, Meghalaya in collaboration with the Indian Statistical Institute, Kolkata. The main motto of this workshop is to refresh/update the knowledge of statistics in collection with the empirical data. Now a days the analysis of this kind of data sets require a good prior knowledge in any software. The most popular statistical software in this domain is the R software. The main reason behind the popularity of this software is that it is free of cost and you can easily download this software from the provided link (R software). Apart from this another reason is the versatility i.e. this software is very much useful in every domain.
Introduction with R software:
It was begun in the 1990s by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland. Nearly 20 senior statisticians provide the core development group of the R language, including the primary developer of the original §language, John Chambers, of Bell Labs. It contains a large, coherent, integrated collection of intermediate tools for data analysis. The software is FREE and can be downloaded from http://www.r-project.org/ or you can download from . The versatility of this software is that it can be used in every domain and the coding structure in this software is quite easy rather than the others. Moreover, R software is used not only for coding purpose, you can also prepare any manuscript or prepare any sorts of presentation in this software. R markdown will help in this case. The major yardstick of the software is the R packages. Basically, Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. Currently, the CRAN package repository features 10964 available packages.https://cran.r-project.org/web/packages/. To install packages type install.packages(pkgs, dependencies=TRUE) in the console.
In most of my research work i have used this software. Moreover, for any sorts of discrepancy do not hesitate to contact me. The contact details is provided in my home page. Don't worry this page will be update as per your needs.
References:
An introduction to R, Longhow Lam. (Link for the material https://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf )
Applied Statistical Inference: Likelihood and Bayes by Leohard Held and Daniel Sabanes Bove, Springer-Verlag Berlin 2014
The R Student Companion, Brian Dennis, CRC Press, 2013.
An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie and Tibshirani, Springer Text in Statistics 2013
Using R for Numerical Analysis in Science and Engineering by Victor A. Broomfield, CRC Press. Taylor and Francis Group 2014
A Primer of Ecology with R by M. Henry and H. Stevens, Springer 2009
Statistical Modeling: The Two Cultures by Leo Breiman, Statistical Science 2001, Vol. 16, No. 3, 199-231.
The Art of R Programming; Norman Matloff
AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS; SALVATORE S. MANGIAFICO : https://rcompanion.org/documents/RCompanionBioStatistics.pdf
Day 1 :
Initially Prof. Bhattacharya has given a basic lecture on the statistics. It is very important to learn the basic statistical aspect to analyse the data which we would collect. In this regard, we would like to apply our statistical knowledge through the R software. The following codes presented here will help you to learn the R software first and then the application of statistics with the R software. The first part of the following i.e. the "Basic Understanding the sampling distribution with the help of R software" is nothing but just an understanding about your theoretical statistical gatherings with the R software. No need to worry about it, because after this we will let you know about the R software from very basic part.
1. Basic Understanding of the sampling distribution with the help of R software:
set.seed(897)
############# Data
############## The unit of the height is in cm #################
Height = rnorm(1000, 165, 10)
Height = round(Height, 0)
Height
#######graphical representation
plot(Height, xlab="students no.", ylab="Height")
######frequency distribution
par(mfrow=c(2, 2))
hist(Height)
#prob=T)
#lines(density(Height, kernal="gaussian"))
##########sample of size 100
sample100=sample(Height, 100)
sample100
######sample frequency distribution
hist(sample100, col="red", main = "Height distribution of 100 students")
##########sample of size 500
sample500=sample(Height, 500)
sample500
######sample frequency distribution
hist(sample500, col="blue", main = "Height distribution of 500 students")
##########sample of size 100 100 times
samplecomb=matrix(NA, 100, 100)
for (i in 1:100)
{
samplecomb[i, ]=sample(Height, 100)
}
sample.comb.row.mean=rowMeans(samplecomb[, 1:100])
hist(sample.comb.row.mean, col="green")
############# Data
set.seed(897)
Height.group.1 = rnorm(1000, 165, 10)
Height.group.1 = round(Height.group.1, 0)
############# Data
set.seed(897)
Height.group.2 = rnorm(1000, 171, 10) + rnorm(1000, 0, 2)
Height.group.2 = round(Height.group.2, 0)
#######graphical representation
plot(Height.group.1, Height.group.2)
Height.group.data = data.frame(Height.group.1, Height.group.2)
cor.height.group.1.2=cor(Height.group.1, Height.group.2); cor.height.group.1.2
fit.linear=lm(Height.group.2~Height.group.1, Height.group.data)
abline(fit.linear, col = "red")
3. Basic operations in R software:
You have already seen the interface and too some extend codes in R software in the previous session. Prof. Bhattacharya had already made a journey of statistics through the R software. By that you can easily understand the software has a nice association with the statistics. But, whatever we don’t understand is the meaning of the codes. Because, several lines were written there with some code language and when the codes are being executed outputs are coming only through the black box of R software. That’s the big problem and obviously a challenging issue. That’s why this workshop is being designed to let you understand the R software from its roots. So, two post lunch sessions are here devoting on the R software. With this few introduction let us start this session. So, in the first session, I will spend time with you in two different methods, first I will give an introduction in R software by the sharing the slide and then I will jump to the R platform for the live demonstration.
x = 2 ### storing a particular value
##### To run any code use "Ctrl + Enter" together from your key board.
print(x) #### Print that value
x
y = 7
print(y)
y
###### Some Fundamental operations ###
z1 = x+y ### Addition of two numbers
z1
z2 = x-y ### Subtraction of two numbers
z2
z3 = x*y; z3 ### Multiplication of two numbers
z4 = x/y; z4 #### Division of two numbers
##### Storing of more than one numbers and perform the same operation as mentioned above
### Vector creation
c()
x1 = c(3, 5, 2, 6) ##### 4 numbers are stored here and c stands for concatenation
length(x1) ##### Check that how many numbers are present there
y1 = c(10, 5, 3, 9)
length(y1)
#### Repeat the same process
a = x1 + y1; a
e = sum(x1, y1); e #### Using R command we can also compute the summation
b = x1 - y1; b
c = x1*y1; c
d = x1/y1; d
#### Sequence of number generation
#### There are two ways of number generation.
##### The first method can be applied if the common difference
##### between the numbers are only 1 and that of
###### second method is applicable for all possible of common difference.
x1 = 1:10; x1 ### 1st 10 natural numbers will generate here
x2 = seq(from = 1, to = 10, by = 3) ### Note that the common difference between the numbers be 3
seq(1, 10, by = 3)
#### Length wise number generation
x3 = seq(1, 50, length.out = 30) #### Some of the appeared numbers should be fraction
x3
length(x3)