Simulation

Monte Carlo Integration

Suppose that we want to perform the integration of the function f(x) = x^2 exp(-2x) in the interval (0,6)

my_fun = function(x){ # declaration of the function

(x^2)*exp(-2*x) # body of the function

}


# It is always a good idea to plot the function

par(mfrow =c(1,1)) # figure plotting windows

curve(my_fun(x), 0, 6, col = "red", lwd=2)


# Other way to plot it (Mainly for Python user)

x = seq(0, 6, length.out = 100)

print(x)

val_fun = my_fun(x) # evaluate the function at x values (mesh)

plot(x, val_fun, col = "red", type = "l", lwd=2)


# Now let us consider doing the integration

# Use of inbuilt function.


val_integral = integrate(my_fun, lower = 0, upper = 6)


# Note: The function integrate is used to perform

# integration in R


print(val_integral) # the value can be considered exact


# let us do the same integral by using simulation

# We shall write down the integral as an expectation of

# some random variable

# So, we can consider X as a uniformly distributed

# random variable over the interval (0,6). Its density function is

# f(x) = 1/6, 0<x<6

# So we are able write down the integral as an expectation of th

# random variable X^2e^(-2X), where X follows U(0,1)

#


# So the problems boils down to approximating the expectation

# of a random variable g(X) = X^2e^(-2X), where X follows

# U(0,1)


# Weak Law of Large Numbers

# If X_1, X_2, ... X_n is a sample of size n from some population

# with mean value mu. Then sample mean (X_1+X_2+...+X_n)/n

# converges to population mean mu in probability.

# Note 1: The sample mean is a random variable.

# But as n tends to infinity, this random quantity acts like a

# constant. The constant is nothing but the population mean

# (mu).


# In our problem, we have random variable g(X) and we

# want to evaluate E(g(X)). So if, we simulate a large number

# of samples from the distribution of g(X) and average them

# then the average value should be close to the E(g(X)), by

# Weak Law of Large Numbers.


# However, the distribution of g(X) may not be known. So we

# will simulate random numbers from the distribution of X

# (uniform(0,6)). And will evaluate the function g(.) at

# those values


n = 1000 # number of observations to be simulated from U(0,6)

x_sim_val = runif(n, min = 0, max = 6)


# It is always a good idea to check the codes immediately

# after you write it.

print(x_sim_val)

hist(x_sim_val, probability = TRUE, col = "grey")


g = my_fun

g_sim_val = g(x_sim_val)

g_sim_val

hist(g_sim_val, probability = TRUE, col = "grey")


approx_integral = 6*sum(g_sim_val)/n

approx_integral


# What we have done: We have approximate the integral

# by using the Monte Carlo integration method. It is very

# important to mention that based on how many number of

# simuated observations, you have computed the value of the

# integral. In this case, we have n=1000


# As a programmer, you should always explore and try things

# Had we consider n= 50, would we get the same accuracy?

# It is important to see the accuracy of the approximation

# as a function of the sample size n


# We want to evaluate the integral as function of n as well

# We want to do the same exercise for different values of n

# We can use an array for values of n and run a loop


n_vals = 1:1000 # an array containing 1,2,3,...,1000

print(n_vals)

approx_integral = numeric(length = length(n_vals))


for (n in n_vals) {

x_sim_val = runif(n, min = 0, max = 6)

g_sim_val = g(x_sim_val)

approx_integral[n] = 6*sum(g_sim_val)/n

}

print(approx_integral) # important


# We will write the integral as a function of n

plot(n_vals, approx_integral, type = "l", col = "red", lwd=2)

abline(h = val_integral$value, col = "grey", lwd=2)


# How to write a legend nicely

legend("topright", legend = c("monte carlo integral", "using integrate()"),

col = c("red", "grey"), lwd=c(2,2))


# We can see that as n increases the accuracy of the monte

# carlo integration increases.


# What we have learnt is that any definite integral

# can be evaluated using Monte Carlo integration. The key

# step is that we need to write the integral as an expectation

# and use the property that the sample mean converges to the

# population mean (in probability ) as n increases.


# Exercise:

# (a) Integrate x^2e^(-2x) in the interval (0, infty)

# (b) Let us suppose that I_n is the value of the integral

# based on a sample of size n. Thus, as the sample values

# changes, the value of I_n also changes. Essentially,

# the quantity I_n is a random variable, not a fixed number.

# Show that the expectation of I_n is I (the exact Integral)

# and Compute the variance of I_n using simulation. Visualize

# the shape of the SAMPLING DISTRIBUTION of I_n using

# 1000 replicatoin. Do this for n=10, n= 30, n= 50, n=100

# in the same graph.


Statistical Simulation: Lecture - I (Date: 12th January 2018)

There are n persons and their hats are numbered as 1, 2, 3,...,n. Suppose they throw the hat in the center of the room and asked to randomly pick one hat. We are interested to compute the probability that at least one person will get his own hat. This problem is discussed last semester in the class and appeared in examination too. The probability of at least one correct assignment of a hat is 1 - (1-1/n)^n. The limit as n tends to infinity is 1-1/e. Let us verify this claim using Statistical Simulation. We are planning to do: we randomly select n numbers from the set {1, 2, ..., n} by simple random sampling without replacement, so that all numbers will be selected in every experiment. This experiment can be performed using R by using the sample(1:n, size = n, replace = FALSE). The replace = FALSE indicates that once a number is selected it will not appear again. Use set.seed(123) function so that the same answer can be obtained for all.

n = 10 # Number of hats

x = 1:n # Reference number to checked for each experiment

m = 1000 # Number of experiments

out = matrix(data = NA, ncol = n, nrow = m) # Experimental outcomes to be stored in out

for(i in 1:m){

out[i, ] = sample(x, size = length(x), replace = FALSE)

}

s = numeric(m) # Storing the number of success

for(i in 1:m){

if(sum(x==out[i,])>0)

s[i] = 1

}

prob = sum(s)/m

print(prob) # Probability of at least one success

Exercise: Write an R program to perform the above experiment with varying n, the number of hats. Plot the probability of at least one matching (obtained from simulation) with respect to n.

Solution:

n = 1:50 # number of persons

m = 10000 # number of times experiment to be performed to estimate the probability of at least on matching

prob_exact = 1-(1-1/n)^n

prob_sim = numeric(length = length(n))

for(i in 1:length(n)){

x = 1:n[i]

out = matrix(data = NA, ncol = n[i], nrow = m)

for(j in 1:m){

out[j, ] = sample(x, size = length(x), replace = FALSE)

}

s = numeric(m) # Storing the number of success

for(j in 1:m){

if(sum(x==out[j,])>0)

s[j] = 1

}

prob_sim[i] = sum(s)/m

}

plot(n, prob_sim, type = "l", col = "red", ylim = c(0.4,1), lwd=2)

lines(prob_exact, type= "l", col = "blue", lwd=2)

legend("topright", legend = c("empirical prob.", "exact prob."), lwd=c(2,2), col=c("red","blue"), bty="n")

While solving the above problem, I felt that a short introduction to R is required, hence the following codes have been developed during lecture and the basic things were explained. These will be rigorously covered in your Software Lab-II course. This page is only for statistical simulations. Find your valuable time to run these codes and understand their functionalities for a better understanding of R Programming. We shall explicitly write programs, so it is better you get familiar with vector, matrix, data.frame, list in R.

In the following we evaluate the probability that X lies between 0 to 1. The distribution of X is normal with mean 1 and sigma 1. The required area under the curve is done by means of simulation.

mu = 1

sigma = 1

fun = function(x){

(1/(sqrt(2*pi)*sigma))*exp(-(x-mu)^2/(2*sigma^2))

}

prob = integrate(fun, 0, 1)$value

curve(fun(x), -3, 5, col = "red", lwd=3)

segments(0,0,0,fun(0), lwd=2)

abline(h=0, lwd=2)

segments(1,0,1,fun(1), lwd=2)


rep = 100

count = numeric(rep)

for(i in 1:rep){

x = rnorm(i, mean = mu, sd = sqrt(sigma))

count[i] = sum(x>0 & x<1)/i

}

plot(1:rep, count, col = "red", pch = ".")

abline(h = prob, col = "green", lwd=2)

Lecture - III (Date: 17th January 2018)

We start our discussion with some plot options. Few students had doubts about some plot options. First, we clarify those, in particular, type = option and how to put legends in a plot. The code which is developed during the lecture follows: Concepts are self-explanatory. Use help(plot), help(legend), help(lines), help(points), help(seq) to know more about those functions.

x = seq(1, 10 ,length.out = 100)

print(x)

y = sin(x)

z = cos(x)

plot(x, y, col = "red" )

points(x, z, col = "blue")

plot(x, y , col = "red", type="l")

lines(x, z, col = "blue", type = "l")

plot(x, y , col = "red", type="b", lwd =2)

lines(x, z, col = "blue", type = "b" ,lwd=2)

# Adding legend

plot(x, y , col = "red", type="l", lwd =2, xlim = c(1, 10),

ylim=c(-1,1.5))

lines(x, z, col = "blue", type = "l", lwd=2)

legend("topright", legend = c("sin(x)","cos(x)"),

col = c("red", "blue"), lwd =c(2,2), bty = "n")

Then we have recalled what we have done in the previous lecture. In the previous lecture, we have written a function generate_exp() to generate n random numbers from the exponential distribution with a fixed rate parameter lambda. We do the process for the logistic(mu, beta) distribution. Note that for the logistic distribution mu is the location parameter and beta is the scale parameter. It is important to introduce the concept of location and scale family.

Location Family: Let f(x) be any probability density function. Then the family of probability density functions f(x-mu), indexed by the parameter mu, -inf < mu < inf, is called location family with standard pdf f(x) and mu is called the location parameter for the family.

Scale Family: Let f(x) be any probability density function. The for any sigma>0, the family of probability density function (1/sigma)f(x/sigma), indexed by the parameter sigma, is called the scale family with standard pdf f(x) and sigma is called the scale parameter of the family.

Location-Scale Family: Let f(x) be any probability density function. The for any mu, -inf < mu < inf, and any sigma>0, the family of probability density functions (1/sigma)f((x-mu)/sigma), indexed by the parameter (mu, sigma), is called the location-scale family with standard pdf f(x); mu is called the location parameter and sigma is called the scale parameter.

Recall that in the first semester we have studied the logistic distribution with parameter mu and beta. The logistic density function is given by f(x) = (1/beta)*(exp(-(x-mu)/beta))/(1 + exp(-(x-mu)/beta))^2, -inf < mu < inf, beta > 0. Note that the logistic family is a location-scale family with the location parameter mu and scale parameter beta. Now we shall write our own function to generate random numbers from the logistic distribution with parameter mu and sigma. Since the distribution function can be explicitly computed from the logistic density function, it is easy to verify the probability integral transform method. Explicit computation (done during the lecture) provides the logistic distribution function as F(x) = 1/(1 + exp(-(x-mu)/beta)). So if u is a uniform random number from [0,1] then the inverse of F at u will give a logistic random variable. Inv(F)(u) = mu + beta * log(u/(1-u)). The following function generate_log(n, mu, beta) utilizes this fact to generate random numbers from logistic(mu, beta) distribution. We also compare our results with the generator rlogis() in R by means of histograms. The corresponding probability density function is overlayed on the histogram for verification.

generate_log = function(n=10, mu=1, beta=2){

n = n # number of random values

mu = mu # location parameter

beta = beta # scale parameter

if(beta <= 0){

return("Invalid scale parameter")

}

u = runif(n, min = 0, max = 1)

x = mu + beta * log(u/(1-u))

return(x)

}

par(mfrow = c(1,2))

mu = 5

beta =3

x = generate_log(n=10000, mu=5, beta = 3)

hist(x, probability = TRUE, ylim = c(0,0.08),

main = "Generate using Inv tranform")

logistic_pdf = function(x){

(1/beta)*(exp(-(x-mu)/beta))/(1+exp(-(x-mu)/beta))^2

}

curve(logistic_pdf(x), -20, 30, lwd=2,

col = "red", add = TRUE)

y = rlogis(n = 10000, location = 5, scale = 3)

hist(y, probability = TRUE, main = "Generate using R")

# Checking for probability density function

exp_pdf = function(x){

exp(-x)

}

integrate(exp_pdf, 0, Inf)

f= function(x){

x^2

}

integrate(f, -1, 1)

Convergence in Probability: This is a very important concept in statistics and plays a very crucial role in statistical simulation. In the lectures, we have already checked how to show the convergence in probability (recall the use of Chebychev's inequality). We shall develop R codes to visualize this concept. There is one magical theorem, we studied in the class, is called the Weak Law of Large Numbers (WLLN). WLLN states that, suppose we generate a random sample of size n from a population distribution which has a finite mean (mu, say) and finite variance, then as n increases the sample mean converges to the population mean (mu) in probability. In layman language, it means that for large n, the sample mean values will be highly concentrated about mu. In the following code, we visualize this fact by simulating random numbers from the uniform distribution in the interval [a,b]. The mean (mu) = (a+b)/2 and var = (b-a)^2/12. So the population distribution has finite mean and finite variance, hence WLLN can be applied. For each n we generate n uniform random numbers from [a,b] and compute the sample mean bar(x_n). Plot n versus bar(x_n) for a range of n values.

par(mfrow= c(1,1))

a = 1; b = 10

n = 10000 # sample size

x_mean = numeric(n)

for(i in 1:n){

x = runif(i, min = a, max = b)

x_mean[i] = mean(x)

}

plot(1:n, x_mean, type= "p", col = "red")

abline(h = (a+b)/2, col = "blue") # mean values of are concentrated about this line

We can visualize the same thing in terms of distribution as well. Basically, the random variables bar(X_n) will be asymptotically degenerate and mu. So here, we plot the distribution of the sample mean for different sample sizes. Please understand the logic of each loop. First, visualize them and draw them on a piece of paper and then start writing the code. Understand the graphical output clearly and match with the lecture notes (theory class).

n = c(5, 20, 100, 500)

m = 1000 # number of experiments to generate the distribution of bar(X_n)

a = 1; b =5

par(mfrow=c(2,2))

for(i in n){

x_mean = numeric(m)

for(j in 1:m){

x = runif(i, min = a, max = b)

x_mean[j] = mean(x)

}

hist(x_mean, probability = TRUE,

xlim=c(a,b), main = paste("n = ", i))

abline(v = (a+b)/2, col = "red", lwd=2)

}

Exercise: In the above exercise different histograms are plotted in different plotting regions for different values of n. Can we combine all the histograms in a single plot having different colors? Add a legend to this plot for different values of n with matching colors.

Statistical Simulation (Lecture-IV Date: 22nd January 2018)

The agenda of our lecture is to visualize the concept of convergence in distribution. Before going to that let us refresh ourselves with a problem on convergence in probability. In the class, a homework was given to you, where you are asked to show that if X_1, X_2,..., X_n are a random sample from uniform(0, theta), then X_(n) = max(X_1,...X_n) converges to theta in probability as n tends to infinity. Here we shall visualize this fact. To make it more general, we generated the uniform random numbers from the interval [a,b] and computed the quantities x_(n) and x_(1) and plotted them for increasing number of sample size. x_(1) will be highly concentrated about a and x_(n) will be highly concentrated to b for a large number of sample sizes (n large).

par(mfrow=c(1,1))

a= 3; b = 6

n = 1:1000

x_1 = numeric(length = length(n))

x_n = numeric(length = length(n))

for(i in 1:length(n)){

x = runif(i, min = a, max = b)

x_1[i] = min(x)

x_n[i] = max(x)

}

plot(n, x_1, col = "red", ylim = c(a-1, b+2), lwd=2,

ylab = "max(x) and min(x)", xlab = "sample size (n)")

points(n, x_n, col = "green", lwd = 2)

legend("topright", legend = c("max(x)", "min(x)"),

lwd = c(2,2), col = c("green", "red"), bty= "n")

Convergence in Distribution: In continuation of the previous problem, we have also seen that the quantity n(1-X_(n)) converges in distribution to the exponential random variable with mean=1. X_(n) = max(X_1,..X_n) and the random numbers are generated from the uniform[0,1] distribution. Let us check whether the distributions are converging to exponential or not. The algorithm works in the following way: Let us fix n = 4, then generate m = 1000 many samples of size n. For each m, we have 5 uniform random numbers from [0,1]. Compute the maximum of this five numbers (x_(n), say) and evaluate n(1-x_(n)). This will be a realization of the random variable n(1-X_(n)) [Note the difference between the use of small x and capital X]. Now draw the histogram of these values. Do this process for n =10, 15, 20 and see that the histograms will be closer to the exponential density function (with lambda = 1).

n = c(5, 10, 20 ,100)

m = 1000

par(mfrow=c(2,2))

for(i in n){

x = matrix(NA, nrow = m, ncol = i)

y = numeric(m)

for(j in 1:m){

x[j,] = runif(i, min =0, max=1)

y[j] = i*(1 -max(x[j,]))

}

hist(y, probability = TRUE, col = "grey", main = paste("n=",i))

curve(exp(-x), add= TRUE, col = "red")

}

Central Limit Theorem: One of the remarkable theorem in statistics. It says that if we have independent and identically distributed random variables following some distribution having a finite mean (mu) and finite variance (sigma^2), the distribution of the random variable Y_n = [sqrt(n)(bar(X_n)-mu)/sigma] can be approximated by the standard normal distribution function. That is Y_n converges to N(0,1) in distribution as n becomes large. Recall that while proving the theorem we have assumed that the moment generating function exists for the population distribution. This may not be the case in general. In the following code, we have demonstrated the central limit theorem using random samples from the uniform[a,b] distribution. Experiment using some other distribution functions like exponential, binomial, Poisson etc.

a = 2

b = 5

mu = (a+b)/2

sigma = (b-a)/sqrt(12)

m = 1000

par(mfrow=c(2,2))

for(n in 1:4){

y = numeric(m)

for(i in 1:m){

x = runif(n, min = a, max = b)

y[i] = sqrt(n)*(mean(x)-mu)/sigma

}

hist(y, probability = TRUE, main = paste("n=",n), xlim = c(-3,3))

curve(expr = (1/sqrt(2*pi))*exp(-x^2/2), col = "red", lwd =2, add = TRUE)

}

Statistical Simulation (Lecture-V Date: 30th January 2018)

Let us recall the utility of statistical simulation. The first one is that we are able to generate a large number of artificial observations from a theoretical probability distribution. Simulation can be used to evaluate the performance of statistical methods which is otherwise not possible using real data as the model parameters are not known. It also plays a very important role to estimate the probability distribution of a random variable whose theoretical distribution is difficult to obtain. We have also verified some probability questions using simulation. Today, we shall mainly concentrate on the simulation of random numbers following some discrete distribution.

Suppose we are interested to generate a random number from the binomial(n,p) distribution. Suppose U_1, U_2,...U_n are independent uniform(0,1) random numbers. For each i = 1 to n, if (U_i <=p), set X_i = 1, otherwise X_i = 0. Then X = X_1 + X_2 +...X_n is an observation from the binomial(n,p) distribution.

Suppose we are interested to generate a random number from the geometric(p) distribution. Go on generating U_1, U_2,... from uniform(0,1) distribution and assign X = min{i, U_i <=p}. Then X follows geometric(p) distribution. Since the expected value of X is 1/p, this implies X is likely to be large if p is very small. An alternative method to generate a geometric(p) random variable is that if U follows uniform(0,1) distribution, then X = [ln(U)/ln(1-p)]+1 is a random variable with geometric(p) distribution.

Suppose we are interested to generate a random number from the Poisson(lambda) distribution. Go on generating U_0, U_1, U_2,... from uniform(0,1). If U_0 <= exp(-lambda), X=0; else X = min{i: U_0*U_1*...*U_(i-1) > exp(-lambda) and U_0*U_1*...*U_(i-1)*U_i <=exp(-lambda)}. Then X follows a Poisson(lambda) distribution.

Statistical Simulation: (Lecture - VI Date: 10th February 2018 )

x = c(1, 7, 2, 3, 4,9, 11) # sample vector

B = 1000 # Number of bootstrap samples to be generated

out = matrix(NA, nrow = B, ncol= length(x))

b_mean = numeric(B)

b_sd = numeric(B)

for(i in 1:B) {

out[i,] = sample(x, size = length(x), replace = TRUE)

b_mean[i] = mean(out[i,])

b_sd[i] = sd(out[i,])

}

mean(b_mean) # Bootstrap mean of sample mean

sd(b_mean) # Bootsrap standard error

b.ci.mean = c(quantile(b_mean, c(2.5, 97.5)/100))

mean(b_sd) # Bootstrap estimate of population standard deviation

sd(b_sd) # Bootstrap standard error of population sd

b.ci.sd = c(quantile(b_sd, c(2.5, 97.5)/100))

par(mfrow = c(1,2))

hist(b_mean, probability = TRUE, col = 2, main="Sample Mean (bootstrap)")

hist(b_sd, probability = TRUE, col = 2, main = "sample SD (bootstrap)")

# Bootstrapping a Linear Regression model

p = 2 # The number of predictors

b0 = 1 # Assumed value of the intercept

b1 = 2 # Assumed coefficient of X1

b2 = 3 # Assumed coefficient of X2

sd = 0.2 # Assumed population standard deviation of eps

n = 30 # Number of data points

set.seed(1)

X1 = runif(n, 1, 10)

X2 = runif(n, 1, 10)

Y = numeric(n)

Y = b0 + b1*X1 + b2*X2 + rnorm(n, 0, sd = sd) # Generate the response variable

print(Y)

X = cbind(rep(1, n), X1 , X2) # Create the design matrix

print(head(X, n =7)) # Print first seven rows of the data matrix

bhat = solve(t(X)%*%X)%*%t(X)%*%Y # Parameter estimates

print(bhat)

# Bootstrapping the regression model

m = 1000

b_hat = matrix(data =NA, nrow = m, ncol = 3)

for (i in 1:m) {

ind = sample(1:n, size = n, replace = TRUE)

bX = X[ind, ]

bY = Y[ind]

b_hat[i, ] = solve(t(bX)%*%bX)%*%t(bX)%*%bY

}

head(b_hat)

# Bootstrap distribution of the estimators of the regression model parameters

par(mfrow = c(2,2))

hist(b_hat[,1], probability = TRUE, xlab = "b0", main = "")

hist(b_hat[,2], probability = TRUE, xlab = "b1", main = "")

hist(b_hat[,3], probability = TRUE, xlab = "b2", main = "")

# Demonstration of Delta Method

# Determination of the confidence interval for the proportion. # Consider the problem of estimating the population proportion of a bin(1,p) population. We draw a random sample of size n and store the sample mean. Continue this process a large number of times so that the sampling distribution of the sample mean can be visualized by means of a histogram. It is interesting to see that the histogram looks like a normal distribution. However, this is not a surprising output (thanks to the Central Limit Theorem).

n_sample = c(5, 10, 15, 20) # number of sample to be drawn

rep = 1000 # number of replication

p = 0.3 # true proportion in the population

par(mfrow=c(2,2))

for(n in n_sample){

mat = matrix(NA, nrow = rep, ncol = n) # matrix to hold the observations

for( i in 1:rep){

mat[i, ] = rbinom(n, 1, p)

}

Let us check the histrogram of each column. Note that each column contains sample from a biomial(1,p) population. Let us compute the sampling distribution of the sample proportion

p_hat = numeric(rep)

for(i in 1:rep){

p_hat = rowMeans(mat)

}

# p_hat is an unbiased estimator. So average of p_hat values should be close to p (true)

mean(p_hat) # which is close to the population proportion

# theoretical sd of p_hat is sqrt(p(1-p)/n)

sd_p_hat = p*(1-p)/n; sd_p_hat

# estimation of the sd of p_hat

est_sd_p_hat = sqrt(p_hat*(1-p_hat)/n)

mean(est_sd_p_hat) # usually we have only a single sample of size n

# approximation by normal distribution

hist(p_hat, col = "grey", prob = T, main = paste("n=", n), xlab = expression(hat(p)), ylim =c(0,4))

curve(dnorm(x, mean = mean(p_hat), sd = mean(est_sd_p_hat)), lwd=2, col = "red", add = TRUE)

}

Suppose that we observe X_1, X_2,...X_n independent Bernoulli(p) random variables. In the previous code, we were interested to estimate the success probability p, but another popular parameter is p/(1-p), the odds. For example, if the data represent the outcomes of a medical treatment with p=2/3, then the person has odds 2:1 of getting better. Moreover, if there were another treatment with success probability r, then the odds ratio [p/(1-p)]/[r/(1-r)] is of interest that gives the relative odds of one treatment over another. Typically estimate of the success probability p is p_hat = (X_1 + X_2+...+X_n)/n. We might consider using p_hat/(1-p_hat) as an estimate of p/(1-p). Then what are the properties of this estimator? How might we estimate the variance of p_hat/(1-p_hat)? How can we approximate the sampling distribution of p_hat/(1-p_hat)? The following code demonstrates the use of delta method in approximating the sampling distribution of the estimator.

par(mfrow=c(1,1))

set.seed(123)

p = 0.3 # population proportion (always unknown)

n_sample = c(10, 20, 30, 50) # sample size

for(n in n_sample){

x = rbinom(n, 1, p) # generate a sample of size n

p_hat = mean(x)

m = 500 # number of bootstrap replication

mat = matrix(NA, nrow = m, ncol = n)

for(i in 1:m){

mat[i, ] = sample(x, size = n, replace = TRUE)

}

boot_mean = mean(rowMeans(mat))

hist(rowMeans(mat), prob = T, col = "grey", main = paste("n=", n), xlab = expression(g(hat(p)))) # bootstrap distribution of p_hat

boot_sd = sd(rowMeans(mat))

boot_ci = quantile(rowMeans(mat), c(0.025, 0.975)) # nonparameteric confidence interval

cat("The bootstrap confidence interval nonpatametric type \n")

print(boot_ci)

cat("The bootstrap confidence interval normal type\n")

print(boot_ci_normal)

cat("Theconfidence interval based on asymptotic normality\n")

print(ci_large_sample)

boot_ci_normal = c(boot_mean - 1.96*boot_sd, boot_mean + 1.96*boot_sd)

ci_large_sample = c(p_hat - 1.96*sqrt(p_hat*(1-p_hat)/n),p_hat + 1.96*sqrt(p_hat*(1-p_hat)/n))

abline(v = boot_ci, col = "red", lwd = 2)

abline(v = boot_ci_normal, col = "blue", lwd = 2)

abline(v = ci_large_sample, col = "green", lwd =2)

legend("topright", legend = c("nonparamertic CI", "normal CI", "large sample CI"), col = c("red","blue","green"), lwd=c(2,2,2), bty = "n")

curve(dnorm(x, mean = p_hat, sd = sqrt(p_hat*(1-p_hat)/n)), col ="green", lwd=2, add = TRUE)

}

Demonstration of Multivariate Delta Method: Ratio type estimator Xbar/Ybar for estimating mu_x/mu_y. This following program is mainly written by Rahul Pal and Dipali Mestry (M.Sc. students)

library(MASS)

sigma1=2

sigma2=3

rho=0.4

sigma=matrix(c((sigma1)^2,rho*sigma1*sigma2,rho*sigma1*sigma2,(sigma2)^2),2,2,byrow=T)

sigma

mu1=20

mu2= 10

param = mu1/mu2

rep = 10000

n = 1000

est_ratio = numeric(rep)

for( i in 1:rep){

mat = mvrnorm(n = n, mu = c(mu1,mu2), Sigma = sigma)

est_ratio[i] = mean(mat[,1])/mean(mat[,2])

}

hist(est_ratio, prob = T, col = "grey")

points(mu1/mu2, 0, lwd=2, pch ="*", col = "blue", cex = 3)

delta_var_true = (mu1/mu2)^2*((sigma1/mu1)^2 +(sigma2/mu2)^2 - 2*rho*sigma1*sigma2/(mu1*mu2))/n

print(delta_var_true)

var(est_ratio)

curve(dnorm(x, mean = mean(est_ratio), sd = sqrt(delta_var_true)), lwd=2, col = "red", add =TRUE)

mat = mvrnorm(n = n, mu = c(mu1,mu2), Sigma = sigma)

delta_var_est = (mean(mat[,1])/mean(mat[,2]))^2*((sd(mat[,1])/mean(mat[,1]))^2 +(sd(mat[,2])/mean(mat[,2]))^2 - 2*cor(mat[,1], mat[,2])*sd(mat[,1])*sd(mat[,2])/(mean(mat[,1])*mean(mat[,2])))/n

curve(dnorm(x, mean = mean(mat[,1])/mean(mat[,2]), sd = sqrt(delta_var_est)), lwd=2, col = "blue", add =TRUE)


Bivariate normal distribution. Typically mvrnorm() function from the MASS package is used to generate random sample from population whose distribution is characterized by a bivariate (or multivariate normal) distribution with mean vector and the covariance matrix (a symmetric positive definite matrix). However, in the following we use a well fact about the multivariate normal distribution to generate the data values.

mu_1 = 3; mu_2 = 4; sigma_1 = 1; sigma_2 = 2; rho = 0.4

n = 1000 #number of bivariate samples

x = numeric(n)

y = numeric(n)

# theorem says that given X=x, Y is normally distributed with mean rho*(sigma_2/sigma_1)*(x-mu_1) and sd sigma_2*sqrt(1-rho^2). This theorem can be used to generate data from a bivariate normal distribution

for(i in 1:n){

x[i] = rnorm(1, mu_1, sigma_1)

y[i] = rnorm(1, mu_2 + rho*(sigma_2/sigma_1)*(x[i]-mu_1),

sd = sigma_2*sqrt(1-rho^2))

}

hist(x, prob = T, col = "grey")

curve(dnorm(x, mu_1, sd = sigma_1), col = "red", lwd=2, add = TRUE)

hist(y, prob = T, col = "grey")

curve(dnorm(x, mu_2, sd = sigma_2), col = "red", lwd=2, add= TRUE)

library(hexbin)

plot(hexbin(data.frame(x,y)))

library(car)

dataEllipse(x,y, levels = c(0.975), col = "red")

# Relationship among the distributions!

# For large lambda, the Poisson distribution can be approximated by a normal distribution

lambda = c(1, 5, 10, 20)

n = 1000

par(mfrow=c(2,2))

for(lam in lambda){

x = rpois(n, lambda = lam)

hist(x, probability = TRUE)

}

# Shape of the distribution for range of p values for binomial distribution. Please note the changes in modes in the histogram

par(mfrow=c(2,2))

p = c(0.1, 0.3, 0.6, 0.9)

m = 30

n= 100

for(prob in p ){

x = rbinom(n, size =m, prob = prob)

hist(x, prob = TRUE, col = "grey", xlim=c(0,m))

}

# For small p and large m, the binomial(m,p) distribution can be approxiamted by Poisson(lambda = np) distribution

par(mfrow=c(2,2))

p = 0.01

m = c(20, 50, 100, 300)

n = 10000

for(size in m){

x = rbinom(n, size = size, prob = p)

hist(x, prob =T, xlim=c(0, 10), border="red" )

x = rpois(n, lambda = size*p)

hist(x, prob =T, xlim=c(0, 10), add= TRUE, border = "blue" )

}

# sum of n bernoulli(1,p) random variables follows bin(n,p) distribution. The code is mainly written by Keshava Rao (M. Sc. students) and I just did some modification to show the graph in a better way

n = 10 # number of trials in the coin tossing experiment

p = 0.5

rep=10000 # number of repititions of n trials

y=numeric(rep)

sum=0

par(mfrow=c(1,2))

x=matrix(NA,rep,n)

for (i in 1:rep) {

for (j in 1:n) {

x[i,j]=rbinom(1,1,p)

sum=sum+x[i,j]

}

y[i]=sum

sum=0

}

x

y

proportion = numeric(n+1)

for(i in 0:n){

proportion[i+1] = sum(y==i)/rep

}

par(mfrow=c(1,1))

plot(0:n, proportion, type= "p", col = "blue", pch="o")

points(0:n, dbinom(0:n, n, p),pch = '*', lwd=2, cex =2, col = "red")

legend("topright", c("Empirical", "Theoretical"), col=c("blue","red"), pch = c("o","*"),lwd=c(1,1) )