1. Concepts & Definitions
1.1. Experiment, observation, and sample space
1.2. Sample space: Venn and Tree diagram
1.3. Simple and composite events
1.4. Three definitions of probability
1.5. Law of large numbers and its consequences
1.6. Frequency and empirical probability
2. Problem & Solution
2.4. Frequency of categories from tables
2.5. Simple and marginal probabilities
2.6. Conditional probabilities
The intersection of events A and B, denoted A ∩ B, is the collection of all outcomes that are elements of both of the sets A and B. It corresponds to combining descriptions of the two events using the word “and.”. To compute the probability of the intersection of events the next equation could be employed.
P(A∩B) = P(A) x P(B|A)
This equation could be interpreted as: “The probability of an intersection between event A and B is equal to the probability of happening event A and then, given that event A happened, the conditional probability to happen event B.”)
To better illustrate both properties of marginal and conditional probability, an experiment is proposed. The next figure contains an illustration of an experiment that collects data about product flow at a certain process on a port and classifies it according to two values on two categories: importation or exportation flow, and Process Time (takes) ≤ 5, or > 5 hours.
The next table summarizes the result after the collection of one hundred values for the process.
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• I - event of finding an importation product,
• E - event of finding an exportation product,
• L - event of finding a product that took less than or equal to five hours in the process,
• G - event of finding a product that took more than five hours in the process.
Remember the computations for simple or marginal probability from Track 04 - Section 2.5:
P(I) = (15+4)/100 = 19/100 = 0.19
P(E) = (45+36)/100 = 0.81
P(L) = (15+45)/100 = 0.60
P(G) = (4+36)/100 = 0.40
Remember the computation for conditional probability from Track 04 - Section 2.6:
Event L occurred, which means consider only products with a waiting time equal to or lower than 5 hours:
P(I|L) = (15)/60 = 0.25
P(E|L) = (45)/60 = 0.75
Event G occurred, which means consider only products with a waiting time higher than 5 hours:
P(I|G) = (4)/40 = 0.10
P(E|G) = (36)/40 = 0.90
P(G∩I) = P(G) x P(I|G) = 0.40 x 0.10 = 0.04
P(G∩E) = P(G) x P(E|G) = 0.40 x 0.90 = 0.36
The following steps will serve as a guideline to use data from simple and conditional probabilities to verify if probability independence between two events holds using a code in Python programming language:
Load the notebook with commands developed in step 2.7. (Click on the link):
https://colab.research.google.com/drive/1q91Pat5iO7d6-2QMfm73lSjTk1ngK40N?usp=sharing
A second simple situation in a port where importation containers could be divided in terms of finding some issue according to their ProductCode (another option could be according to its origin employing historical records of the port). To be didactic, suppose the container type could be Vegetable and FoodProduct. Let's work with products according to the information of ProductCode and have issues or not:
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• V - event of finding a vegetable product,
• F - event of finding a food product,
• N - event of finding a product without an issue,
• I - event of finding a product with an issue.
Remember from previous sections that for the simple probability of events:
P(V) = (1900+100)/5000 = 2000/5000 = 0.40
P(F) = (2900+100)/5000 = 0.60
P(N) = (4800)/5000 = 0.96
P(I) = (200)/5000 = 0.04
Remember from previous sections that for the conditional probability of events:
P(N|V) = (1900)/2000 = 2000/5000 = 0.95
P(I|V) = (100)/2000 = 0.05
P(N|F) = (2900)/3000 = 0.97
P(I|F) = (100)/3000 = 0.03
Now several questions about the joint probability of P(V∩N) and P(V∩I) could be computed using the following equations:
P(V∩N) = P(V)*P(N|V) = 0.40*0.95 = 0.38
P(V∩I) = P(V)*P(I|V) = 0.40*0.05 = 0.02
The previous manual calculations could be done using Python language as done in the next code:
import pandas as pd
col = ['Vegetable', 'FoodProduct']
row = ['NoIssues', 'Issues']
dat = [[1900, 2900], [100, 100]]
df_impp = pd.DataFrame(data=dat, index=row, columns=col)
df_impp.loc[:,'SumI'] = df_impp.sum(axis=1)
df_impp.loc['SumP',:] = df_impp.sum(axis=0)
PNI = df_impp.loc['NoIssues','SumI']/df_impp.loc['SumP','SumI']
PI = df_impp.loc['Issues','SumI']/df_impp.loc['SumP','SumI']
PV = df_impp.loc['SumP','Vegetable']/df_impp.loc['SumP','SumI']
PF = df_impp.loc['SumP','FoodProduct']/df_impp.loc['SumP','SumI']
print("PNI = ",PNI)
print("PI = ",PI)
print("PV = ",PV)
print("PF = ",PF)
The following probability values will appear:
PNI = 0.96
PI = 0.04
PV = 0.4
PF = 0.6
The following codes help in the computation of conditional probabilities employing the definition of conditional probability:
PVN = df_impp.loc['NoIssues','Vegetable']/df_impp.loc['SumP','Vegetable']
PFN = df_impp.loc['NoIssues','FoodProduct']/df_impp.loc['SumP','FoodProduct']
PVI = df_impp.loc['Issues','Vegetable']/df_impp.loc['SumP','Vegetable']
PFI = df_impp.loc['Issues','FoodProduct']/df_impp.loc['SumP','FoodProduct']
print("P(NoIssues|veg) = ",PVN)
print("P(Issues|veg) = ",PVI)
print("P(NoIssues|food) = ",PFN)
print("P(Issues|food) = ",PFI)
The following probability values will appear:
P(NoIssues|veg) = 0.95
P(Issues|veg) = 0.05
P(NoIssues|food) = 0.97
P(Issues|food) = 0.03
The following codes help in the computation of joint probability P(V∩N) and P(V∩I):
PVAN = PV*PVN
PVAI = PV*PVI
print('P(Veg and NoIssues) = ',PVAN)
print('P(Veg and Issues) = ',PVAI)
P(Veg and NoIssues) = 0.38
P(Veg and Issues) = 0.020000000000000004
The previous manual calculations could be done using Python language as done in the next code:
https://colab.research.google.com/drive/1lsB-_DjY2GsQZIiPsREvjX78pO38Ifaa?usp=sharing