1.2. Normal distribution of probability

1. Concepts & Definitions

1.1. Continous random distribution of probability

1.3. Standard normal distribution of probability

1.4. Inverse standard normal distribution

1.5. Student's T distribution

1.6. Inverse Student's T distribution

2. Problem & Solution

2.1. Weight, dimension, and value per HS6

2.2. How to fit a distribution

2.3. Employing standard deviation

2.4. Total time spent in a system

2.5. Application of Gaussian Mixture

2.6. Gaussian Mixture on OCDB database

Normal or Gaussian distribution

It is denoted as X ∼ N(μ, σ 2) and is read as X is a continuous random variable that follows a Normal

distribution with parameters μ, and σ^2. Where μ is the mean, and σ^2 is the variance. Examples: Heights of people, exam scores of students, IQ Scores, packaging weights of cereal and cookies, amount of milk in a bottle, and life of equipment such as a light bulb or a TV follows Normal distribution.

Properties of Normal distribution are:

• The random variable takes values from −∞ to +∞.

• The probability associated with any single value is Zero. looks like a bell curve and is symmetric about x = μ. 50% of the data lies on the left-hand side and 50% of the data lies on the right-hand side.

• The area under the curve (AUC) = 1.

• All the measures of central tendency coincide i.e., mean = median = mode.

The Normal distribution probability mass function formula follows the next equation.

Where: μ is the mean of Normal distribution, and σ is the standard deviation.

The Normal distribution cumulative probability formula follows the next equation.

Instead of solving the integral explicitly, numerical methods through computational commands or employing numerical tables from a special case of normal distribution, called standard normal distribution which will be seen in the next section.

Example of normal distribution application

An automotive guide indicates the Porsche 911 as the car that best retains value. It is expected that a new car of US$ 87,500 has a value of US$48,125 after 3 years of use. Assuming that the price distribution of all Porsches 911 at three years old follows a Normal distribution with a mean of US$48.125 and a standard deviation of US$1600. Find the following probabilities:

• One vehicle chosen at random has a selling price of between US$46,000 and US$49,000,

• One vehicle chosen at random has a selling price higher than US$49,000.

Using the computational commands to obtain the normal distribution cumulative probabilities with parameters μ = 87, and σ = 500:

• Probability of price between US$46,000 and US$49,000:

P(46, 000 ≤ X ≤ 49, 000) = P(X ≤ 49, 000) − P(X ≤ 46, 000) = 0.7078 − 0.0921 = 0.6157

• Probability of price higher than US$49,000:

P(X ≤ 49, 000) + P(X ≥ 49, 000) = 1 → P(X ≥ 49, 000) = 1 − P(X ≤ 49, 000) → P(X ≥ 49, 000) = 1 − 0.7078 → P(X ≥ 49, 000) = 0.2922

Computational Experiment Example

The following code shows how to compute binomial probabilities employing PDF of Normal distribution using norm.pdf command.

from scipy.stats import norm

import matplotlib.pyplot as plt

import numpy as np

#creating an array of values between

#40 to 120 with a difference of 2

x = np.arange(40, 120, 2)

mean = 80

std = 10

y = norm.pdf(x, loc = mean, scale = std)

plt.plot(x,y,'r-',x, y,'bo')

plt.show()

plt.bar(x, y)

plt.show()

The following code is particularly interesting to show how to compute the interval of probabilities employing the cumulative distribution function (CDF) of Normal distribution using norm.cdf command.

from scipy.stats import norm

import matplotlib.pyplot as plt

import numpy as np

#creating an array of values between

#40 to 120 with a difference of 2

x = np.arange(40, 120, 2)

mean = 80

std = 10

y = norm.cdf(x, loc = mean, scale = std)

plt.grid()

plt.plot(x, y)

plt.show()

Applying code in the numerical example

The following code shows the probability of the numerical example employing CDF of Normal distribution using norm.cdf command.

from scipy.stats import norm

import matplotlib.pyplot as plt

import numpy as np

x = [46000, 49000]

mean = 48125

std = 1600

y = norm.cdf(x, loc = mean, scale = std)

print(y)

[0.09206841 0.70776769]

pinterval = y[1] - y[0]

print('P(46000 <= X <= 49000) = ',pinterval)

P(46000 <= X <= 49000) = 0.6156992859925272

plower = 1 - y[1]

print('P(X >= 49000) = 1 - P(X <= 49000) = ',plower)

P(X >= 49000) = 1 - P(X <= 49000) = 0.29223230610840834

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1UYgTO-8cT7ws5m6C90LgwD8XwirzY3s1?usp=sharing

Page updated

Google Sites

Report abuse