Before
After
Validation and Cleaning:
In this phase, I carefully examined the dataset to ensure its reliability and accuracy. I checked for unique sales methods and fixed any inconsistencies due to misspells or errors.
# Correct sales_method values
pens_printers["sales_method"] = pens_printers["sales_method"].replace('em + call', 'Email + Call')
pens_printers["sales_method"] = pens_printers["sales_method"].replace('email', 'Email')
Additionally, I addressed missing revenue values and filtered out outliers to avoid skewing our insights. This meticulous process laid the foundation for trustworthy data analysis.
pens_printers_dict = pens_printers.groupby('sales_method')['revenue'].median().to_dict()
pens_printers['revenue'] = pens_printers['revenue'].fillna(pens_printers['sales_method'].map(pens_printers_dict))
Exploratory Analysis: Histogram of Revenue:
To understand the revenue distribution for the new product line, I plotted a histogram. The graph revealed that most sales fell in the lower revenue range, with the highest peak at approximately 55.
# Set the figure size
plt.figure(figsize=(10, 6))
# Plot histogram with hue to visualize revenue distribution per sales method
sns.histplot(data=pens_printers, x='revenue', hue='sales_method')
# Add titles to the plot
plt.title('Revenue Distribution per Sales Method')
plt.xlabel('Revenue')
plt.ylabel('Count')
# Show the plot
plt.show()
However, I also noticed a few high-revenue sales that significantly impacted the overall revenue. This insight helped us gauge the contribution of various transactions to the total revenue.
Exploratory Analysis: Count Plot of Sales Method:
To visualize the distribution of sales methods, I created a count plot.
# Set the color palette
sns.set_palette('pastel')
# Set the plot style
sns.set_style('whitegrid')
# Set the figure size
plt.figure(figsize=(8, 6))
# Plot count plot to visualize sales_method distribution
sns.countplot(data=pens_printers, x=pens_printers['sales_method'])
# Add titles to the plot
plt.title('Sales Method Distribution')
plt.xlabel('Sales Method')
plt.ylabel('Count of Customers')
# Show the plot
plt.show()
The plot showed that the "Email" sales method attracted the most customers, followed by the "Call" method, while "Email + Call" had the fewest customers. This information highlighted the popularity of each approach but also prompted the need to consider conversion rates for better decision-making.
Exploratory Analysis: Scatter Plot of Revenue vs. Number of Site Visits:
In this scatter plot, I examined the relationship between the number of site visits and revenue, categorized by sales methods.
# Plot the scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=pens_printers, x='nb_site_visits', y='revenue', hue='sales_method', palette='Set1', alpha=0.7)
plt.title('Scatter Plot: Revenue vs. Number of Site Visits by Sales Method')
plt.xlabel('Number of Site Visits')
plt.ylabel('Revenue')
plt.legend(title='Sales Method', loc='upper left')
plt.show()
The plot revealed positive correlations between site visits and revenue across all methods. Interestingly, the "Email + Call" method exhibited a distinct cluster with higher revenue and moderate site visits, indicating its potential effectiveness.
Metric Definition: Conversion Rate of Sales Methods:
To measure the effectiveness of each sales approach, I defined the "Conversion Rate of Sales Methods" as our key metric. This metric allows us to monitor the percentage of customers who made a purchase after being exposed to each sales method. Tracking this conversion rate weekly will provide valuable insights into the success of our sales strategies.
# Calculate total revenue from each sales method
revenue_by_method = pens_printers.groupby('sales_method')['revenue'].sum()
# Calculate the total revenue from all methods
total_revenue = pens_printers['revenue'].sum()
# Calculate the percentage of revenue for each sales method
sales_method_percentages = (revenue_by_method / total_revenue * 100).round(2)
print(sales_method_percentages)
# Create a bar plot to visualize the conversion rate of revenue for each sales method
plt.figure(figsize=(8, 6))
sns.barplot(x=sales_method_percentages.index, y=sales_method_percentages.values)
plt.title('Conversion Rate of Revenue from Each Sales Method')
plt.xlabel('Sales Method')
plt.ylabel('Percentage of Revenue')
plt.show()
Recommendations
With strong positive correlations between revenue and site visits for the "Email + Call" method, I recommend prioritizing and optimizing this approach. Emphasizing "Email + Call" could potentially lead to higher revenue generation and improved customer engagement.
Considering the lower effectiveness of the "Call" method, discontinuing its use is recommended. Additionally, I suggest further investigation into state-wise customer differences to address potential regional variations in the new product line's success.
To maximize website visits and revenue, I advise improving email content and enhancing the website's presentation. Engaging emails and an appealing website design can encourage more site visits and boost overall revenue
Conclusion
In conclusion, this portfolio project provided valuable insights into Pens and Printers' new product line sales. By validating and analyzing the data, we uncovered crucial patterns and correlations that informed our recommendations. I am confident that optimizing sales methods and leveraging key metrics will help Pens and Printers achieve their revenue goals and stay competitive in the market.