1. Concepts & Definitions
1.1. Experiment, observation, and sample space
1.2. Sample space: Venn and Tree diagram
1.3. Simple and composite events
1.4. Three definitions of probability
1.5. Law of large numbers and its consequences
1.6. Frequency and empirical probability
2. Problem & Solution
2.4. Frequency of categories from tables
2.5. Simple and marginal probabilities
2.6. Conditional probabilities
Represents the probability of a single event, without taking into account any other event. The following expression is used: P(A). It means: “The probability to happen event A.”
The term “marginal” comes from the fact that it can be found by summing values in a table along rows or columns, and writing the sum in the margins of the table. Another way to think about the computation of this probability is to use the Relative probability definition which is the frequency of event E divided by the number of repetitions.
The notation to represent the simple probability is P(A) which means the probability of event A.
To better illustrate both properties of marginal and conditional probability, an experiment is proposed. The next figure contains an illustration of an experiment that collects data about product flow at a certain process on a port and classifies it according to two values on two categories: importation or exportation flow, and Process Time (takes) ≤ 5, or > 5 hours.
The next table summarizes the result after the collection of one hundred values for the process.
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• I - event of finding an importation product,
• E - event of finding an exportation product,
• L - event of finding a product that took less than or equal to five hours in the process,
• G - event of finding a product that took more than five hours in the process.
Now several questions about simple or marginal probability could be answered:
P(I) = (15+4)/100 = 19/100 = 0.19
P(E) = (45+36)/100 = 0.81
P(L) = (15+45)/100 = 0.60
P(G) = (4+36)/100 = 0.40
The following steps will serve as a guideline to extract data to compute simple or marginal probabilities in Python programming language:
Load the notebook with commands developed in step 2.4. (click on the link):
https://colab.research.google.com/drive/1CKHXfvPksQHKAoeryYwsAOlaKTz9JIm_?usp=sharing
A second simple situation in a port where importation containers could be divided in terms of finding some issue according to their ProductCode (another option could be according to its origin employing historical records of the port). To be didactic, suppose the container type could be Vegetable and FoodProduct.
Let's work with products according to the information of ProductCode and have issues or not according to the following table:
Before starting to compute probabilities, it is necessary to define the following events about a product randomly picked in the process which follows the data from Table:
• V - event of finding a vegetable product,
• F - event of finding a food product,
• N - event of finding a product without an issue,
• I - event of finding a product with an issue.
Now several questions about simple or marginal probability could be answered:
P(V) = (1900+100)/5000 = 2000/5000 = 0.40
P(F) = (2900+100)/5000 = 0.60
P(N) = (4800)/5000 = 0.96
P(I) = (200)/5000 = 0.04
The previous manual calculations could be done using Python language as done in the next code:
import pandas as pd
col = ['Vegetable', 'FoodProduct']
row = ['NoIssues', 'Issues']
dat = [[1900, 2900], [100, 100]]
df_impp = pd.DataFrame(data=dat, index=row, columns=col)
df_impp.loc[:,'SumI'] = df_impp.sum(axis=1)
df_impp.loc['SumP',:] = df_impp.sum(axis=0)
PNI = df_impp.loc['NoIssues','SumI']/df_impp.loc['SumP','SumI']
PI = df_impp.loc['Issues','SumI']/df_impp.loc['SumP','SumI']
PV = df_impp.loc['SumP','Vegetable']/df_impp.loc['SumP','SumI']
PF = df_impp.loc['SumP','FoodProduct']/df_impp.loc['SumP','SumI']
print("PNI = ",PNI)
print("PI = ",PI)
print("PV = ",PV)
print("PF = ",PF)
The following probability values will appear:
PNI = 0.96
PI = 0.04
PV = 0.4
PF = 0.6
The Python code with all the steps is summarized in this Google Colab (click on the link):
https://colab.research.google.com/drive/1T7SaI0gEDf11FEKMB7cXIPNRyHOyaTkC?usp=sharing