The following steps will serve as a guideline to extract numerical measures statistics from the previous data frame:
Load the notebook with commands developed in step 2.2. (click on the link):
https://colab.research.google.com/drive/1nBs9HfaD2l3BxVTUB48OT-9ZhqMRlnh6?usp=sharing
Use the command describe() for all columns in data frame df1:
df1.describe()
To obtain statistics from the column 'January 2022' is necessary to convert all values to float and then apply the command describe(). Additionally, the result could be stored in a variable called statistics:
statistics = df1['January 2022'].astype(float).describe()
count 9.700000e+01
mean 2.374360e+05
std 4.922685e+05
min 6.420000e+02
25% 1.558200e+04
50% 7.375400e+04
75% 2.266370e+05
max 3.212252e+06
Name: January 2022, dtype: float64
The command round(2) applied in the variable statistics helps to visualize the statistical values with two decimals:
statistics.round(2)
count 97.00
mean 237435.99
std 492268.52
min 642.00
25% 15582.00
50% 73754.00
75% 226637.00
max 3212252.00
Name: January 2022, dtype: float64
Another interesting result could be achieved by selecting the product categories with the top ten highest values in column 'January 2022' stored in the variable df1_descendent_order (created in section 2.2). Select the first ten rows, and the second and third columns using the command iloc[list(range(0,10)),[1, 2]]. The complete command is as follows:
top_ten_products = df1_descendent_order.iloc[list(range(0,10)),[1, 2]]
To extract the statistics for data stored in the variable top_ten_products, the same combination of commands employed in step 3 could be employed:
top_ten_statistics = top_ten_products['January 2022'].astype(float).describe()
count 1.100000e+01
mean 1.303623e+06
std 8.975032e+05
min 4.742080e+05
25% 7.611265e+05
50% 9.809560e+05
75% 1.509286e+06
max 3.212252e+06
Name: January 2022, dtype: float64
Again the command round(2) is useful to visualize statistics values:
top_ten_statistics.round(2)
count 11.00
mean 1303622.64
std 897503.23
min 474208.00
25% 761126.50
50% 980956.00
75% 1509286.50
max 3212252.00
Name: January 2022, dtype: float64
The Python code with all the steps is summarized in this Google Colab (click on the link):
https://colab.research.google.com/drive/18AiwXUioJrA3vBclMSlhOEFCO5s_IANb?usp=sharing