Probability and Statistics with R Programming

This webpage is created to cater to the basic probability and statistics requirements for undergraduate engineering students using R Programming. It will only serve the purpose to learn the basic topics which are covered in the Applied Mathematics - I (MAT 1101) for Bachelor of Chemical Engineering Students in ICT Main Campus. Thanks to the encouragement of BChem students, in particular, Arya Shah, to put these online. With time we will try to make this page comprehensive manner so that it will cover the foundations for data sciences applications for our engineering students.

To take the best benefit of R programming, it is important to take a probability distribution refresher and a bit of statistics concepts. The following lectures should give a decent idea. It will be helpful for the students to connect the programming part with the corresponding theory. Thanks to Dr. Sandeep Bhairat for his invitation and encouragement to modularize the concepts for our students in Jalna Campus. However, any ICT students with a valid login should be able to see these lectures.

Lectures on R Programming

Lecture III (2 Hours)

1) Concept of population and sample

2) Function of random sample: Mean and variance of the sample mean

3) The Central Limit Theorem and Visualization of Sampling distribution of the sample mean for Poisson, Normal and Exponential distribution. Investigation of the sampling distribution for increasing sample size. As the sample size increases, the histograms are close to normal density.

4) Computation of 95% confidence interval for parameters of normal mean. Approximate 95% confidence interval for lambda for Poisson distribution. Example computation using R

5) Connection with the testing of hypothesis and confidence interval is established.

Exercise 1: Simulate n =10 random samples from binomial(1,p) (tossing a coin 10 times). This will give you a sequence of zeroes and ones. Compute the proportion of 1 in the sequence. This is essentially an approximate value of p, call it p_hat. However, this approximation is subject to uncertainty. Suppose you repeat this process m = 100 and obtain the sampling distribution of p_hat. Can you relate the histogram of p_hat values with a well-known distribution? State the theorem.

Exercise 2: Repeat Exercise 1 for n=3, 5, 20, 50 to draw the histogram of p_hat in a single plot window.

Link to the Video: Click Here

Link to the R File: Click Here

Link to the Writing Notes: Click Here


Lecture II (2 Hours)

1) Continuous probability density (PDF) function and examples

2) Using R to check whether a function is PDF and how a non-negative integrable function can be converted to PDF.

3) Use of integrate() function in R

4) Computing expectation and variance of continuous random variables using integrate() function

5) The four letters "r", "d", "p", "q": examples include rnorm(), rbinom(), dnorm(), dexp(), pnorm(), qnorm().

6) Understanding distribution of data (histogram) and its connection to the underlying population density function. As the sample size increases the histograms are getting closer to the population density function.

Exercise 1: Write your own function of the exponential(lambda) PDF. Check whether the area under the curve is one. Compute expected value and the variance using the integrate() function and verify with the analytical results.

Exercise 2: Simulate n = 10, 20, 30, ..., 1000 random numbers from the normal distribution with mean 0 and variance 1. For each set of numbers, you compute the average and variance. Plot these mean values and variance values as a function of n. What is your observation?

Exercise 3: Fix n = 30. Simulate a random sample of size n = 30 from the normal distribution with mean 0 and variance 1. Compute the average. Repeat this exercise for m = 500 times. You will have m = 500 many average values Draw a histogram of these average values. How does it look like? Can you identify the distribution? What is its variance?

Link to the Video: Click Here

Link to the R File: Click Here


Lecture I (2 Hours):

1) Introduction to R Programming

2) vector, matrix, data frame, and list

3) Operation on matrices

4) Loops and nested loops

5) Writing mathematical functions and basic plotting options (including multiple plots)

6) Writing custom functions: addition of matrics, multiplication of matrices (homework), computing factorial (homework)

7) Probability mass functions and their plots (Binomial and Poisson)

Link to the Video: Click Here

Link to the R File: Click Here


Basic Probability Distribution and Statistical Ideas

Lecture I (2 Hours):

1) Review of Probability Concepts

3) Theorem of Total Probability and Partition of Sample Space

2) Conditional Probability and Bayes Theorem

Link to the Video: Click Here


Lecture II (2 Hours):

1) Definition of Random variable (function from sample space to real number)

2) Probability Mass function with example (Discrete Random variable)

3) Cumulative Distribution Function with example (step function and continuous function) and its properties

4) Probability Density Function (Continuous Random Variable)

5) Checking whether functions are pmf, pdf, or CDF (important from exam point of view)

6) Probability mass functions introduced: Binomial(n,p), Poisson(lambda), and two exercises are given

7) Probability Density Functions introduced: Exponential (lambda), Normal (mu, sigma^2) and Uniform(a,b)

Link to the Video: Click Here


Lecture III (1 Hour):

1) Expectation and moments

2) Computation of Variance with examples

3) Physical interpretation of Expectation as the center of mass

Link to the Video: Click Here


Lecture IV (2 Hours):

1) Moment Generating Function and Computation of Moments

2) Joint Probability Mass Function

Link to the Video: Click Here


Lecture V (2 Hours):

Discussion on the Questions and Tutorial Problems (Important from Examination point)

Link to the Video: Click Here


Lecture VI (2 Hours):

1) Properties of Normal distribution

3) Computation of Moments, Absolute Moments of order alpha for normal distribution

4) Approximation of Poisson distribution using Binomial distribution

5) Joint Cumulative Probability Distribution Function

6) Joint Probability Density Function with Example and graphical illustration.

Link to the Video: Click Here


Lecture VII (2 Hours):

1) Joint Probability Density Function and Computation of Marginal PDF

2) Discussion on Discrete and Continous Case and Graphical Display

3) Computation of Probabilities related to joint distribution with examples

4) Independence of random variables f(x,y) = f_X(x)f_Y(y)

5) Computation of Covariance between X and Y. Cov(X, Y) =E(XY)-E(X)E(Y)

6) Independence implies Cov(X,Y) = 0 and Definition of Correlation

Link to the Video: Click Here


Lecture VIII (2 Hours):

1) Problem discussion on Joint density function, conditional density function, independence, computation of covariance

2) Principle of least squares and least square line fitting.

Link to the Video: Click Here


Lecture IX (1 Hour):

1) Discussion on normal distribution

2) Basic description of functions of one variable and two variables

3) Plotting and basic geometric ideas

Link to the Video: Click Here


Lecture X (2 Hours):

1) Testing of Hypothesis

2) Distinction between population and sample

3) Sampling from the normal population

4) Sample Mean and Sample Variance and their sampling distribution

5) Chisquare density function and connection to the gamma density function

6) Computation of mean and variance of the sample mean and sample variance

7) General approach for testing of hypothesis

Link to the Video: Click Here