1. Concepts & Definitions
1.1. Experiment, observation, and sample space
1.2. Sample space: Venn and Tree diagram
1.3. Simple and composite events
1.4. Three definitions of probability
1.5. Law of large numbers and its consequences
1.6. Frequency and empirical probability
2. Problem & Solution
2.4. Frequency of categories from tables
2.5. Simple and marginal probabilities
2.6. Conditional probabilities
Represents the probability that an event will occur, given that another event has occurred. If A and B are events, the conditional probability of A, given B, is given by the expression P(A|B). This expression should be read as: “The probability of event A to occur given that event B has already occurred.”
One important aspect of conditional probability is once event B happens or is known it will restrain the sample space in a manner that the probability of happening A could be affected. Then, again the Relative probability could be applied.
The notation for conditional probability is: P(A|B) which means given the event B, what is the probability to happen the event A.
To better illustrate both properties of marginal and conditional probability, an experiment is proposed. The next figure contains an illustration of an experiment that collects data about product flow at a certain process on a port and classifies it according to two values on two categories: importation or exportation flow, and Process Time (takes) ≤ 5, or > 5 hours.
The next table summarizes the result after the collection of one hundred values for the process.
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• I - event of finding an importation product,
• E - event of finding an exportation product,
• L - event of finding a product that took less than or equal to five hours in the process,
• G - event of finding a product that took more than five hours in the process.
Now several questions about conditional probability could be answered considering that the events L or G happened and the corresponding restriction on sample space that it causes :
Event L occurred, which means consider only products with a waiting time equal to or lower than 5 hours:
P(I|L) = (15)/60 = 0.25
P(E|L) = (45)/60 = 0.75
Event G occurred, which means consider only products with a waiting time higher than 5 hours:
P(I|G) = (4)/40 = 0.10
P(E|G) = (36)/40 = 0.90
The following steps will serve as a guideline to extract data to compute simple and conditional probabilities in Python programming language:
Load the notebook with commands developed in step 2.5. (Click on the link):
https://colab.research.google.com/drive/1T7SaI0gEDf11FEKMB7cXIPNRyHOyaTkC?usp=sharing
A second simple situation in a port where importation containers could be divided in terms of finding some issue according to their ProductCode (another option could be according to its origin employing historical records of the port). To be didactic, suppose the container type could be Vegetable and FoodProduct. Let's work with products according to the information of ProductCode and have issues or not:
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• V - event of finding a vegetable product,
• F - event of finding a food product,
• N - event of finding a product without an issue,
• I - event of finding a product with an issue.
Now several questions about conditional probability could be answered:
P(N|V) = (1900)/2000 = 2000/5000 = 0.95
P(I|V) = (100)/2000 = 0.05
P(N|F) = (2900)/3000 = 0.97
P(I|F) = (100)/3000 = 0.03
The previous manual calculations could be done using Python language as done in the next code:
import pandas as pd
col = ['Vegetable', 'FoodProduct']
row = ['NoIssues', 'Issues']
dat = [[1900, 2900], [100, 100]]
df_impp = pd.DataFrame(data=dat, index=row, columns=col)
df_impp.loc[:,'SumI'] = df_impp.sum(axis=1)
df_impp.loc['SumP',:] = df_impp.sum(axis=0)
PNI = df_impp.loc['NoIssues','SumI']/df_impp.loc['SumP','SumI']
PI = df_impp.loc['Issues','SumI']/df_impp.loc['SumP','SumI']
PV = df_impp.loc['SumP','Vegetable']/df_impp.loc['SumP','SumI']
PF = df_impp.loc['SumP','FoodProduct']/df_impp.loc['SumP','SumI']
print("PNI = ",PNI)
print("PI = ",PI)
print("PV = ",PV)
print("PF = ",PF)
The following probability values will appear:
PNI = 0.96
PI = 0.04
PV = 0.4
PF = 0.6
The following codes help in the computation of conditional probabilities employing the definition of conditional probability:
PVN = df_impp.loc['NoIssues','Vegetable']/df_impp.loc['SumP','Vegetable']
PFN = df_impp.loc['NoIssues','FoodProduct']/df_impp.loc['SumP','FoodProduct']
PVI = df_impp.loc['Issues','Vegetable']/df_impp.loc['SumP','Vegetable']
PFI = df_impp.loc['Issues','FoodProduct']/df_impp.loc['SumP','FoodProduct']
print("P(NoIssues|veg) = ",PVN)
print("P(Issues|veg) = ",PVI)
print("P(NoIssues|food) = ",PFN)
print("P(Issues|food) = ",PFI)
The following probability values will appear:
P(NoIssues|veg) = 0.95
P(Issues|veg) = 0.05
P(NoIssues|food) = 0.97
P(Issues|food) = 0.03
The Python code with all the steps is summarized in this Google Colab (click on the link):
https://colab.research.google.com/drive/1y6e6aXb-CFH1mecWl_pvk5uvVMrVqB7F?usp=sharing