Capstone Project 1 - Mind Map
Google colab notebook with Python code
https://colab.research.google.com/drive/16Gi_HjxdSBGkMvZsyEWUDD8d-1qn8VOH?usp=sharing
Explaination of the code by employing ChatGPT (just the first part)
This notebook performs an exploratory analysis on an e-commerce dataset, focusing on identifying the most valuable product shipping and customer locations using the Pareto principle (80/20 rule).
Data Loading
Reads a CSV file hosted on Google Drive into a pandas DataFrame.
Data Inspection
Displays the structure and column names of the dataset using df.columns and df.info().
Derived Feature: Weight per Unit
A new feature weight_per_unit is computed by dividing the total weight (WeightKg) by the number of items (ItemCount) per order.
Pareto Analysis Function
A custom function performs Pareto analysis by:
Grouping the data by a selected category (e.g., shipping country or customer country).
Summing the selected value field (e.g., Price).
Sorting and computing the cumulative percentage.
Plotting a Pareto chart: bar plot for values and line plot for cumulative percentage.
Drawing a horizontal line at 80% to visually identify the most impactful categories.
Applications of the Analysis
The Pareto function is applied twice:
Once to evaluate revenue distribution by shipping country.
Once to evaluate revenue distribution by customer country.
Pandas Data Manipulation: grouping, sorting, column creation.
Pareto Principle (80/20 Rule): a small number of categories often account for the majority of the effect (e.g., sales).
Matplotlib Visualization: custom plots combining bars and lines for business insights.
Feature Engineering: deriving new data features like weight per unit to aid in analysis.