Workshop in the Deparment of Management, North Eastern Hill University, Meghalaya 

(13th March-19th March, 2023)

The workshop is organized by the Department of Management, North Eastern Hill University, Meghalaya in collaboration with the Indian Statistical Institute, Kolkata. The main motto of this workshop is to refresh/update the knowledge of statistics in collection with the empirical data. Now a days the analysis of this kind of data sets require a good prior knowledge in any software. The most popular statistical software in this domain is the R software. The  main reason behind the popularity of this software is that it is free of cost and you can easily download this software from the provided link (R software). Apart from this another reason is the versatility i.e. this software is very much useful in every domain.

 Introduction with R software:

It was begun in the 1990s by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland. Nearly 20 senior statisticians provide the core development group of the R language, including the primary developer of the original §language, John Chambers, of Bell Labs. It contains a large, coherent, integrated collection of intermediate tools for data analysis. The software is FREE and can be downloaded from  http://www.r-project.org/  or you can download from . The versatility of this software is that it can be used in every domain and the coding structure in this software is quite easy rather than the others. Moreover, R software is used not only for coding purpose, you can also prepare any manuscript or prepare any sorts of presentation in this software. R markdown will help in this case. The major yardstick of the software is the R packages. Basically, Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. Currently, the CRAN package repository features 10964 available packages.https://cran.r-project.org/web/packages/. To install packages type install.packages(pkgs, dependencies=TRUE) in the console.

In most of my research work i have used this software. Moreover, for any sorts of discrepancy do not hesitate to contact me. The contact details is provided in my home page. Don't worry this page will be update as per your needs. 

References:

Day 1 :

1. Basic Understanding of the sampling distribution with the help of R software:

set.seed(897)


############# Data


############## The unit of the height is in cm #################


Height = rnorm(1000, 165, 10) 


Height = round(Height, 0)


Height


#######graphical representation


plot(Height, xlab="students no.", ylab="Height")


######frequency distribution


par(mfrow=c(2, 2))


hist(Height)


#prob=T)

#lines(density(Height, kernal="gaussian"))


##########sample of size 100


sample100=sample(Height, 100)


sample100


######sample frequency distribution


hist(sample100, col="red", main = "Height distribution of 100 students")


##########sample of size 500


sample500=sample(Height, 500)


sample500


######sample frequency distribution


hist(sample500, col="blue", main = "Height distribution of 500 students")


##########sample of size 100 100 times


samplecomb=matrix(NA, 100, 100)


for (i in 1:100)


{


samplecomb[i, ]=sample(Height, 100)


}


sample.comb.row.mean=rowMeans(samplecomb[, 1:100])


hist(sample.comb.row.mean, col="green")


############# Data


set.seed(897)


Height.group.1 = rnorm(1000, 165, 10)



Height.group.1 = round(Height.group.1, 0)


############# Data


set.seed(897)


Height.group.2 = rnorm(1000, 171, 10) + rnorm(1000, 0, 2)


Height.group.2 = round(Height.group.2, 0)


#######graphical representation


plot(Height.group.1, Height.group.2)


Height.group.data = data.frame(Height.group.1, Height.group.2)


cor.height.group.1.2=cor(Height.group.1, Height.group.2); cor.height.group.1.2


fit.linear=lm(Height.group.2~Height.group.1, Height.group.data)


abline(fit.linear, col = "red")

3. Basic operations in R software:

x = 2 ### storing a particular value


##### To run any code use "Ctrl + Enter" together from your key board.


print(x) #### Print that value


x


y = 7


print(y)


y

###### Some Fundamental operations ###


z1 = x+y ### Addition of two numbers


z1


z2 = x-y ### Subtraction of two numbers


z2


z3 = x*y; z3 ### Multiplication of two numbers


z4 = x/y; z4 #### Division of two numbers

##### Storing of more than one numbers and perform the same operation as mentioned above


### Vector creation


c()


x1 = c(3, 5, 2, 6) ##### 4 numbers are stored here and c stands for concatenation


length(x1) ##### Check that how many numbers are present there


y1 = c(10, 5, 3, 9)


length(y1)


#### Repeat the same process


a = x1 + y1; a


e = sum(x1, y1); e #### Using R command we can also compute the summation


b = x1 - y1; b


c = x1*y1; c


d = x1/y1; d


#### Sequence of number generation


#### There are two ways of number generation.


##### The first method can be applied if the common difference


##### between the numbers are only 1 and that of


###### second method is applicable for all possible of common difference.


x1 = 1:10; x1 ### 1st 10 natural numbers will generate here


x2 = seq(from = 1, to = 10, by = 3) ### Note that the common difference between the numbers be 3


seq(1, 10, by = 3)


#### Length wise number generation


x3 = seq(1, 50, length.out = 30) #### Some of the appeared numbers should be fraction

x3


length(x3)

3. Data structure in R software: