1. Concepts & Definitions
1.1. Example of random variables
1.2. Probability of events to random variables
1.4. Discrete uniform distribution of probability
1.5. Bernoulli distribution of probability
1.6. Binomial distribution of probability
1.7. Hypergeometrical distribution of probability
2. Problem & Solution
Imagine an inspection process for a container with a problem. Typically, this process can be modeled as a process of finding the problem (success) or not (failure) on each try. Also, once a container is selected, it is not returned to the initial container quota. This is called process without a replacement and is illustrated in the next Figure with N = 10 (population size), K = 5 (number of success in population), n = 4 (sample size), x = 2 (number of success - red containers found) .
This process is best modeled using a hypergeometric distribution. The following question could be formulated:
What is the probability of finding at least one container with a problem given that the sample has size n?
To properly answer this, it means find P(X >=1 ) = P(X = 1) + P(X = 2) + ...
or since P(X = 0) + P(X >= 1) = 1 -> P(X >= 1) = 1 - P(X = 0).
How does P(X >= 1) probability change as n increases?
To answer the previous question, the Hypergeometric distribution can be used with the following parameters: with N = 1000 (population size), K = 5 (number of success in population), n = 2 (sample size), and compute:
• Happen exactly zero failures in inspection: P(X = 0).
• More than zero failures in inspection: P(X >= 1) = 1 - P(X = 0).
What happens with an increase in the sample size ranging from 0 to 200?
Instead of employing completed equations, the answers to these two questions will be addressed through Python code.
The following code shows the probability of the numerical example employing PMF of Hypergeometric distribution using hypergeom.pmf command to find P(X = 0).
from scipy.stats import hypergeom
import matplotlib.pyplot as plt
import numpy as np
# computing P(X) for each X.
x = [0]
N = 1000
K = 5
n = 2
y = hypergeom(N, K, n).pmf(x)
print(' X |',x)
print('P(X) |',y)
X | [0]
P(X) | [0.99002002]
The following code shows the probability of finding at least one issue depending on the sample size and showing the corresponding graphic.
from scipy.stats import hypergeom
import matplotlib.pyplot as plt
import numpy as np
# computing P(X) for each X.
x = [0]
N = 1000
K = 5
ns = list(range(0,201))
y = []
for n in ns:
res = hypergeom(N, K, n).pmf(x)
y.append(1-res)
plt.plot(ns,y,'-r')
plt.xlabel('n (Sample Size)')
plt.ylabel('Probability find issue (P(X>=1))')
plt.grid()
plt.show()
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1hBvNNszy1Lq8fOYmQ9uh-SOnbjzC--Rm?usp=sharing