The normal distribution is a probability function that summarises how the values of a variable are likely distributed. It is a symmetric distribution where most of the observations congregate around the central peak where the mean, median and mode all coalesce. The normal distribution can be used to calculate the probabilities of a given range of outcomes occurring if we know the mean and standard deviation of a population being sampled. Values that diverge from the population mean are viewed as being less likely. The more extreme values are relative to the mean the more distribution can be observed taper off equally in both directions. Increasingly more extreme values in both tails become increasingly more improbable.
Deducing the probability of a an event where there is a known mean and standard deviation for the population involves finding the area below the density curve. We assume here the mean = 0 and the standard deviation =1.
A Z-score or standard score, represents the number of standard deviations a data point is away from the average (mean) of the group. Z-scores, are standardised values. It is routine when reading tables to convert to Z-scores, the average will always be 0 and the Standard Deviation will be 1. This makes interpretation more straightforward. If the Z-score is positive, it means the score is above the average value, whereas a negative Z-score indicates the score is below the average. Graphs from Basic Business Statistics text book.
The general form of its probability density function is given above. The parameter mean (μ) of the distribution (and also its median and mode); and sigma is its standard deviation δ. The variance of the distribution is δ ^2. A random variable, X, with a Gaussian distribution is said to be normally distributed and is called a normal deviate. Normal distributions are critical in data analytics and are frequently employed in the sciences to represent real-valued random variables whose distributions might not be known. The Normal or Gaussian distribution is a continuous probability distribution that has a bell-shaped probability density function (Gaussian function), or less formally a bell curve. In the case of a theoretical normal distribution curve the rule of thumb is that 68% of the population or sample will be plus or minus one standard deviation δ from the population or sample mean μ. Similarly, 95% of the population or sample values will stretch out between plus and minus two standard deviations (more precisely 1.96) on either side of the mean. See video below and check out excel file.
Below, I took some graphs from Wikipedia that help reveal the distribution of variable conforming to the normal distribution
# Introduction to Econometrics with R
# Probability Distributions Chapter 2
# https://www.econometrics-with-r.org/2-1-random-variables-and-probability-distributions.html
sample(1:6, 1)
# Table 2.1: PDF and CDF of a Dice Roll
# Outcome 1 2 3 4 5 6
# Probability 1/6 1/6 1/6 1/6 1/6 1/6
# Cumulative Probability 1/6 2/6 3/6 4/6 5/6 1
# generate the vector of probabilities
probability <- rep(1/6, 6)
# plot the probabilites
plot(probability,
main = "Probability Distribution",
xlab = "outcomes")
# generate the vector of cumulative probabilities
cum_probability <- cumsum(probability)
# plot the probabilites
plot(cum_probability,
xlab = "outcomes",
main = "Cumulative Probability Distribution")
# compute mean of natural numbers from 1 to 6
mean(1:6)
# set seed for reproducibility
set.seed(2)
# rolling a dice three times in a row
sample(1:6, 3, replace = T)
# set seed for reproducibility
set.seed(1)
# compute the sample mean of 10000 dice rolls
mean(sample(1:6,
10000,
replace = T))
var(1:6)
# draw a plot of the N(0,1) PDF
curve(dnorm(x),
xlim = c(-3.5, 3.5),
ylab = "Density",
main = "Standard Normal Density Function")
# compute denstiy at x=-1.96, x=0 and x=1.96
dnorm(x = c(-1.96, 0, 1.96))
# plot the standard normal CDF
curve(pnorm(x),
xlim = c(-3.5, 3.5),
ylab = "Density",
main = "Standard Normal Cumulative Distribution Function")
# compute the probability
1 - 2 * (pnorm(-1.96))